LLMOps - Model Serving with Rust and Actix Web

Duke Spring 2024 IDS721 Final Project
Group Members: Daniel, Emily, Yilin, Hiep
video demo: https://youtu.be/VpOEaIy_U44

Description

In this project, we operationalize machine learning by serving an open-source model through a web service developed in Rust. This involves containerizing the service for deployment on Kubernetes and automating the workflow with a CI/CD pipeline. Monitoring, metrics collection, and thorough documentation are essential components of the project, alongside a clear and concise demonstration of the application through a YouTube video.

This project demonstrates the integration of a machine learning model using Rust with Actix Web to serve model predictions over a web service. Specifically, it utilizes the bloom-560m-15_1-ggjt.bin model from Rustformers (based on the Huggingface models) to perform text generation tasks. The application is designed to be deployed on AWS Elastic Kubernetes Service (EKS) for scalable and robust operations. Be sure to download the model and put the file in the src folder along with main.rs before running the project.

Project Features:

Obtain Open Source ML Model: We select and acquire an open-source machine learning model suitable for serving. The model is capable of providing inferences based on input data it receives.
Create Rust Web Service for Model Inferences: We develop a web service in Rust that can serve the ML model's inferences. The service is robust, efficiently handling requests and providing accurate responses from the model.
Containerize Service and Deploy to Kubernetes: We containerize the Rust web service using Docker, ensuring it's well-prepared for deployment. Subsequently, deploy the containerized service to a Kubernetes cluster, configuring it for scalability and reliability.
Implement CI/CD Pipeline: We establish a Continuous Integration and Continuous Deployment (CI/CD) pipeline to automate the testing, building, and deployment processes of the application. This pipeline also support rapid iteration and deployment of changes to the service.

Project Structure

main.rs: Sets up the Actix Web server, routes, and CORS configuration.
core.rs: Contains the logic for loading the model, performing inference, and handling errors.
/src: Contains the model binaries and tokenizer configurations.
Cargo.toml: Manages Rust dependencies and project settings.
bloom-560m-15_1-ggjt.bin: Rustformer model \

How to run the Project backend

Download the bloom-560m-q5_1-ggjt.bin file from https://huggingface.co/rustformers/bloom-ggml/blob/main/bloom-560m-q5_1-ggjt.bin, and put it under the src directory before running the project.
Run backend: go to artix_backend folder:

cargo run

To test the backend locally:

curl -X POST http://127.0.0.1:8080/ -H "Content-Type: application/json" -d '{"message": "Hello, A"}'

The result should be a Json, with response as key: {"response":"Hello, Aunt Sally."}%

Deploy front end on Docker:

Create Docker image:

docker build -t rs-model-frontend .

Deploy front end on Docker:

docker run -p 3000:3000 rs-model-frontend

Demo deployment of backend and frontend:

Docker & Kubernetes Deployment

Docker

Build the Docker image

docker build -t artix-backend .

Tag the Docker image

docker tag artix-backend:latest artix-backend:latest

Run the Docker image locally

docker run -p 5000:5000 artix-backend:latest

Push the Docker image to Docker Hub

docker push artix-backend:latest

Link to Docker Image: https://hub.docker.com/r/cr7forever666/artix_backend

Kubernetes

Prepare Your AWS Environment (Install AWS CLI, kubeclt, eksctl, etc.)
Create an EKS Cluster

eksctl create cluster --name my-cluster --region us-west-2 --nodegroup-name my-nodes --node-type t3.medium --nodes 3 --nodes-min 1 --nodes-max 4 --managed

Configure kubectl to Connect to Your EKS Cluster

aws eks --region region-code update-kubeconfig --name cluster-name

Create a Kubernetes Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: artix-backend-deployment
spec:
  replicas: 2
  selector:
    matchLabels:
      app: artix-backend
  template:
    metadata:
      labels:
        app: artix-backend
    spec:
      containers:
        - name: artix-backend
          image: yourusername/artix_backend:latest
          ports:
            - containerPort: 8000

Create a Kubernetes Service

apiVersion: v1
kind: Service
metadata:
  name: artix-backend-service
spec:
  type: LoadBalancer
  ports:
    - port: 80
      targetPort: 8000
      protocol: TCP
  selector:
    app: artix-backend

Apply the Deployment and Service

kubectl apply -f deployment.yaml
kubectl apply -f service.yaml

Access the Service

kubectl get services

8. EC2 Instance 9. Cluster

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
artix_backend		artix_backend
images		images
rs-model-frontend		rs-model-frontend
.DS_Store		.DS_Store
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLMOps - Model Serving with Rust and Actix Web

Description

Project Features:

Project Structure

How to run the Project backend

Deploy front end on Docker:

Docker & Kubernetes Deployment

Docker

Kubernetes

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LLMOps - Model Serving with Rust and Actix Web

Description

Project Features:

Project Structure

How to run the Project backend

Deploy front end on Docker:

Docker & Kubernetes Deployment

Docker

Kubernetes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages