In this workshop, join Machine Learning Research Engineer Sachin Sharma and learn how to use Nvidia’s Triton Inference server (formerly known as TensorRT Inference Server), which simplifies the deployment of AI models at scale in production. We focus on hosting/deploying multiple trained models (Tensorflow, PyTorch) on the Triton inference server to leverage its full potential for this examination. Once models are deployed, we can make inference requests and can get back the predictions.

Disclaimer

In order to make the flow of the workshop smooth, the audience needs to have some packages install beforehand:

Install Docker (https://docs.docker.com/get-docker/)
Pulling triton server docker image from Nvidia NGC
Image size: 10.6 GB (10-15 mins to install depending upon the internet)
To view the downloaded docker image: docker images
The repository which we will follow throughout the workshop (optional) https://github.com/sachinsharma9780/AI-Enterprise-Workshop-Building-ML-Pipelines

Agenda:
– Introduction to ArangoDB and Nvidia’s Triton Inference Server (Need, features, applications, etc.)
– Setting up Triton Inference server on a local machine
– Deploy your first trained model (Tensorflow) with an application to image classification on Triton inference server
– Deploy almost any Hugging Face PyTorch models with an application to zero-short text classification on Triton inference server (Here we will convert given PyTorch models to Triton acceptable models)
– Once models are deployed, we can write a python-client side script to interact with the Triton server (i.e., sending requests and receiving back the predictions)
– Exploring the python image_client.py script to make an image classification request
– Writing down our own client-side script to interact with NLP Models
– Triton Metrics
– Storing inference results in ArangoDB using python-arango

Sachin Sharma

About the Presenter:

Sachin is a Machine Learning Research Engineer at ArangoDB whose aim is to build Intelligent products using thorough research and engineering in the area of Graph Machine Learning. He completed his Masters’s degree in Computer Science with a specialization in Intelligent Systems. He is an AI Enthusiast who has conducted research in the areas of Computer Vision, NLP, and Graph Neural Networks at DFKI (German Research Centre for AI) during his academic career. Sachin also worked on building Machine Learning pipelines at Define Media Gmbh where he worked as a Machine Learning Engineer and Scientist.

Fireside Chat – Powering GenAI: The Critical Foundations for Scale. Watch Now

Sachin Sharma

About the Presenter:

Quick Links

Info

About Us

Stay In Touch