Character LLM Factory 🤖📝🎤

This project demonstrates the complete MLOps lifecycle in practice, from raw data collection to the deployment of a containerized inference service. The goal was to build a "factory" for fine-tuning Large Language Models (LLMs) to learn to speak like a specific character—in this case, Rick Sanchez from Rick and Morty.

🚀 Engineering & MLOps Highlights

This project is not just about fine-tuning; it's a case study on building an end-to-end, robust, and reproducible AI system.

Automated Data Pipeline (ETL): A scraper built with requests + BeautifulSoup extracts, transforms (cleans and formats), and loads Rick and Morty scripts into a clean text corpus ready for training.
LLM Fine-Tuning: Leverages the Hugging Face ecosystem (transformers, datasets, accelerate) to fine-tune a base model (distilgpt2) on the custom dataset, teaching it to capture the character's style and personality.
Optimized Inference Service: The fine-tuned model is served via a high-performance FastAPI service. The model is loaded only once on application startup to ensure low-latency requests.
Containerization for Portability: The inference application, including the trained model artifact, is packaged into a Docker image, ensuring the service is portable and can run consistently in any environment. Optimized with a .dockerignore file.
CI/CD & Code Quality: A GitHub Actions pipeline validates the API code's quality (Ruff) and correctness (Pytest) on every push, ensuring service integrity.

🏗️ The MLOps Lifecycle

This project implements the key phases of an MLOps pipeline for custom model creation:

graph TD
    A["Web Source (Transcripts)"] --> B{"1. Data Pipeline (scraper.py)"};
    B -- "Cleans & Formats" --> C["Clean Corpus (rick_corpus.txt)"];
    C --> D{"2. Fine-Tuning Script (trainer.py)"};
    subgraph "Hugging Face"
      D -- Uses --> HF_Base("Base LLM - e.g., distilgpt2");
    end
    D -- Saves --> E["Fine-Tuned Model (rick-llm-final)"];
    E --> F{"3. Inference Service (FastAPI)"};
    subgraph "Docker Container"
        F -- "Loads Model" --> E;
    end
    F --> G["API Endpoint (/generate)"];
    H["User"] --> G;
    G --> H;

🏁 Getting Started

Prerequisites

Git, Python 3.9+, Docker Desktop

Option 1: Run the Pre-Trained Service (Recommended)

This option uses the already fine-tuned model included in this repository.

Clone the repository:

git clone [https://github.com/PRYSKAS/character_llm_factory.git](https://github.com/PRYSKAS/character_llm_factory.git)
cd character_llm_factory

Build the Docker image:
```
docker build -t rick-llm-service .
```
Run the container:
```
docker run -d -p 8001:8001 --name rick-llm rick-llm-service
```
Access the API at http://127.0.0.1:8001/docs to interact with the Rick LLM.

Option 2: Recreate the Model (The Full Pipeline)

Follow these steps if you want to run the data pipeline and fine-tuning process from scratch.

Clone and install dependencies:

git clone [https://github.com/PRYSKAS/character_llm_factory.git](https://github.com/PRYSKAS/character_llm_factory.git)
cd character_llm_factory
pip install -r requirements.txt

Run the Data Pipeline:
```
python -m data_pipeline.scraper
```
This will generate the rick_corpus.txt file.
Run the Fine-Tuning Process:
```
python -m finetuning.trainer
```
This will train the model and save the artifacts in the rick-llm-final folder.

Start the inference service locally:

uvicorn serving.main:app --reload --port 8001

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data_pipeline		data_pipeline
finetuning		finetuning
rick-llm-distilgpt2/checkpoint-654		rick-llm-distilgpt2/checkpoint-654
rick-llm-final		rick-llm-final
serving		serving
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Character LLM Factory 🤖📝🎤

🚀 Engineering & MLOps Highlights

🏗️ The MLOps Lifecycle

🏁 Getting Started

Prerequisites

Option 1: Run the Pre-Trained Service (Recommended)

Option 2: Recreate the Model (The Full Pipeline)

About

Uh oh!

Releases

Packages

Languages

PryskaS/character_llm_factory

Folders and files

Latest commit

History

Repository files navigation

Character LLM Factory 🤖📝🎤

🚀 Engineering & MLOps Highlights

🏗️ The MLOps Lifecycle

🏁 Getting Started

Prerequisites

Option 1: Run the Pre-Trained Service (Recommended)

Option 2: Recreate the Model (The Full Pipeline)

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages