A complete, production-grade MLOps environment for teaching real-world machine learning operations. This workshop demonstrates an end-to-end ML pipeline with Customer Churn Prediction using industry-standard tools.
- Overview
- Architecture
- Prerequisites
- Quick Start
- Project Structure
- Components
- ML Pipeline
- Access URLs
- Workshop Guide
- Troubleshooting
This workshop provides a hands-on experience with:
| Component | Technology | Purpose |
|---|---|---|
| Container Orchestration | Kubernetes (Kind) | 3-node cluster for production-like environment |
| Workflow Orchestration | Apache Airflow | DAG-based ML pipeline automation |
| Experiment Tracking | MLflow | Model versioning, metrics, and registry |
| Object Storage | MinIO | S3-compatible data lake and artifact storage |
| Database | PostgreSQL | Backend for MLflow and Airflow |
| Feature Store | Redis | Online feature serving cache |
| Model Serving | FastAPI | REST API with A/B testing |
| Frontend | React + TailwindCSS | Real-time prediction dashboard |
We use the IBM Telco Customer Churn dataset (verified, real-world data) to demonstrate:
- Data ingestion from external sources
- Feature engineering pipeline
- Multi-model training (Logistic Regression, Random Forest, Gradient Boosting)
- A/B testing between model versions
- Real-time prediction serving
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β KIND KUBERNETES CLUSTER β
β (1 Control Plane + 2 Workers) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β AIRFLOW βββββΆβ MINIO ββββββ MLFLOW β β
β β Scheduler β β Data Lake β β Tracking β β
β β Webserver β β Artifacts β β Registry β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β β β β
β β ββββββββββββββββ β β
β βββββββββββββΆβ POSTGRESQL ββββββββββββ β
β β Backend β β
β ββββββββββββββββ β
β β β
β ββββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββββ β
β β MODEL SERVING LAYER β β
β β βββββββββββββββ βββββββββββββββ ββββββββββββ β β
β β β FastAPI β β A/B Router β β Redis β β β
β β β /predict βββββΆβ 80%/20% βββββΆβ Cache β β β
β β βββββββββββββββ βββββββββββββββ ββββββββββββ β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β ββββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββββ β
β β FRONTEND DASHBOARD (React) β β
β β β’ Real-time Predictions β’ A/B Test Visualization β β
β β β’ Model Comparison β’ Feature Importance β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Before starting, ensure you have:
- Docker Desktop - Download
- kubectl - Install Guide
- Kind - Install Guide
- Python 3.10+ - Download
- Node.js 18+ (for frontend development) - Download
# Install Kind (using Chocolatey)
choco install kind -y
# Install kubectl
choco install kubernetes-cli -y
# Verify installations
kind version
kubectl version --client
docker versioncd F:\MLOPS-Fundamentals\mlops-lab-workshop# Option A: Deploy all at once
.\scripts\deploy-all.ps1
# Option B: Step-by-step deployment
.\scripts\01-create-cluster.ps1 # Create Kind cluster
.\scripts\02-deploy-infrastructure.ps1 # Deploy MinIO, PostgreSQL
.\scripts\03-deploy-mlflow.ps1 # Deploy MLflow
.\scripts\04-deploy-airflow.ps1 # Deploy Airflow
.\scripts\05-deploy-ml-pipeline.ps1 # Deploy ML serving components# Start all port-forwards
.\scripts\port-forward.ps1# Run the complete ML pipeline
.\scripts\06-run-ml-pipeline.ps1mlops-lab-workshop/
βββ π k8s/ # Kubernetes manifests
β βββ airflow/ # Airflow deployment
β βββ mlflow/ # MLflow deployment
β βββ minio/ # MinIO (S3) deployment
β βββ postgres/ # PostgreSQL deployment
β βββ redis/ # Redis deployment
β βββ model-serving/ # FastAPI model serving
β βββ frontend/ # React dashboard
β βββ namespaces/ # Namespace definitions
β
βββ π src/ # Source code
β βββ data/ # Data ingestion scripts
β β βββ download_data.py
β βββ features/ # Feature engineering
β β βββ feature_engineering.py
β βββ training/ # Model training
β β βββ train_model.py
β βββ serving/ # FastAPI application
β β βββ app/
β β β βββ main.py # API endpoints
β β β βββ predictor.py # Model inference
β β β βββ ab_router.py # A/B testing logic
β β βββ Dockerfile
β β βββ requirements.txt
β βββ frontend/ # React dashboard
β βββ src/
β βββ package.json
β βββ Dockerfile
β
βββ π dags/ # Airflow DAGs
β βββ 01_data_ingestion.py # Download & store data
β βββ 02_feature_engineering.py # Transform features
β βββ 03_model_training.py # Train & register models
β βββ 04_model_evaluation.py # Evaluate & update A/B
β βββ 05_full_pipeline.py # End-to-end orchestration
β
βββ π scripts/ # Deployment scripts
β βββ deploy-all.ps1 # One-click deployment
β βββ 01-create-cluster.ps1
β βββ 02-deploy-infrastructure.ps1
β βββ 03-deploy-mlflow.ps1
β βββ 04-deploy-airflow.ps1
β βββ 05-deploy-ml-pipeline.ps1
β βββ 06-run-ml-pipeline.ps1
β βββ port-forward.ps1 # Access all services
β βββ status.ps1 # Check cluster status
β βββ cleanup.ps1 # Remove everything
β
βββ π github-dags-template/ # Template for GitHub DAGs repo
βββ kind-cluster-config.yaml # Kind cluster configuration
βββ requirements.txt # Python dependencies
βββ README.md # This file
MinIO provides S3-compatible storage for:
- Raw Data:
data-lake/raw/telco_churn/ - Processed Features:
data-lake/processed/features/ - Model Artifacts:
mlflow-artifacts/
Transforms raw customer data into ML features:
| Feature Category | Examples |
|---|---|
| Tenure | tenure_months, is_new_customer, is_loyal_customer |
| Charges | monthly_charges, avg_monthly_charge, charge_per_tenure |
| Services | total_services, has_internet, has_streaming |
| Contract | contract_month_to_month, payment_electronic |
| Risk | risk_score (calculated heuristic) |
Three models trained and compared:
| Model | Description | Typical AUC |
|---|---|---|
| Logistic Regression | Baseline, interpretable | ~0.82 |
| Random Forest | Ensemble, good accuracy | ~0.85 |
| Gradient Boosting | Best performance | ~0.86 |
- Champion Model: 80% traffic (best performer)
- Challenger Model: 20% traffic (experimental)
- Real-time metrics tracking
- Automatic promotion based on performance
FastAPI endpoints:
| Endpoint | Method | Description |
|---|---|---|
/predict |
POST | Single customer prediction |
/predict/batch |
POST | Batch predictions |
/predict/explain |
POST | Prediction with feature importance |
/models |
GET | List deployed models |
/ab-stats |
GET | A/B testing statistics |
/health |
GET | Health check |
/metrics |
GET | Prometheus metrics |
React-based dashboard with:
- Real-time churn predictions
- Model performance comparison
- A/B test visualization
- Feature importance charts
| DAG | Schedule | Description |
|---|---|---|
01_data_ingestion |
Daily | Download dataset from IBM, upload to MinIO |
02_feature_engineering |
Daily | Transform raw data to features |
03_model_training |
Weekly | Train models, register in MLflow |
04_model_evaluation |
Daily | Compare models, update A/B config |
05_full_pipeline |
Weekly | End-to-end orchestration |
Option 1: Via Airflow UI
- Open http://localhost:8080
- Enable the
05_full_pipelineDAG - Trigger manually or wait for schedule
Option 2: Via Script
.\scripts\06-run-ml-pipeline.ps1After running .\scripts\port-forward.ps1:
| Service | URL | Credentials |
|---|---|---|
| Airflow UI | http://localhost:8080 | admin / admin123 |
| MLflow UI | http://localhost:5000 | (no auth) |
| MinIO Console | http://localhost:9001 | minioadmin / minioadmin123 |
| Model API Docs | http://localhost:8000/docs | (no auth) |
| Dashboard | http://localhost:3000 | (no auth) |
- Create Kind cluster
- Deploy MinIO and PostgreSQL
- Explore Kubernetes resources
- Deploy MLflow
- Run training script
- Compare experiments in UI
- Understand model registry
- Explore data ingestion DAG
- Run feature engineering
- Understand feature store concepts
- Train multiple models
- Compare metrics
- Promote to production
- Version management
- Deploy FastAPI service
- Make predictions
- Understand A/B routing
- Monitor performance
- Explore dashboard
- End-to-end prediction flow
- Real-world scenarios
Pods not starting?
kubectl get pods -A
kubectl describe pod <pod-name> -n <namespace>
kubectl logs <pod-name> -n <namespace>MLflow not connecting to MinIO?
# Check MinIO is accessible
kubectl port-forward svc/minio 9000:9000 -n minio
# Test with aws cli
aws --endpoint-url http://localhost:9000 s3 lsAirflow DAGs not showing?
# Check DAG sync
kubectl logs deployment/airflow-scheduler -n airflowModel serving failing?
# Check if models are registered
kubectl port-forward svc/mlflow 5000:5000 -n mlflow
# Visit http://localhost:5000/#/models.\scripts\cleanup.ps1
.\scripts\deploy-all.ps1Telco Customer Churn Dataset
- Source: IBM Watson Analytics Sample Data
- Size: 7,043 customers
- Features: 21 columns
- Target: Churn (Yes/No)
- Churn Rate: ~26.5%
This is an educational project. Feel free to:
- Add new models
- Improve feature engineering
- Enhance the dashboard
- Add monitoring with Prometheus/Grafana
MIT License - Use freely for educational purposes.
Happy Learning! π
Built for MLOps/AIOps Workshop - production ML to experienced professionals
