I build end-to-end ML systems — from raw unstructured data through feature engineering, model training, calibration, and production serving. My work spans computer vision, NLP, reinforcement learning, causal inference, recommendation systems, time-series forecasting, and MLOps. Current focus: extracting spatial features from broadcast video that don't exist in any public dataset and pricing them against live sports markets. Open to any role in ML, Computer Vision, Quantitative Researcher, and/or Sports Analytics.
court-vision — Possession-level NBA simulator. Broadcast video in, fractional-Kelly-sized +EV positions out.
Broadcast Video -> YOLOv8n detection -> SIFT homography -> Kalman+Hungarian tracking
-> OSNet re-ID -> EasyOCR -> EventDetector -> CV Features (defender_distance,
spacing_score, legs_fatigue) -> 75-Model ML Stack -> 10K Monte Carlo
-> Fractional Kelly + Ledoit-Wolf correlation -> CLV benchmark vs Pinnacle close
75 trained models · 7 prop models · R² 0.47 pts · +14 bps CLV vs Pinnacle close · 960+ tests
Three CV features carry 31% of SHAP mass in the points model — these don't exist in any public NBA dataset. Walk-forward season-purged evaluation, Shin-devigged closing lines, conformal prediction intervals on every bet.
| Project | What It Does |
|---|---|
| court-vision | Full CV+ML NBA pipeline — YOLOv8, homography, multi-object tracking, re-ID, event detection. 75 models, FastAPI + Next.js serving |
| game-film-analyzer | Automated play breakdown — YOLOv8 + ByteTrack + court homography + FastAPI |
| sports-vision-tracker | Multi-object player tracking with Kalman filters, court homography, real-time analytics |
| deep-learning-cv | Deep learning from scratch — NumPy backprop implementation through PyTorch CNNs and ResNet/EfficientNet transfer learning |
| Project | What It Does |
|---|---|
| calibcraft | Platt / isotonic / beta calibration with reliability diagrams, ECE, MCE. Sklearn-compatible |
| kellycorr | Correlated Kelly criterion — full-covariance fractional Kelly sizing for multi-leg portfolios |
| clvtrack | Closing-line value tracker — benchmarks predictions against Pinnacle close to measure edge decay |
| walkforge | Walk-forward + purged-K-fold backtester for ML models. Pandas-in, metrics-out |
| Project | What It Does |
|---|---|
| rl-portfolio-optimizer | PPO/SAC agents for multi-asset portfolio rebalancing — Gymnasium env with transaction costs, HMM regime detection, walk-forward backtest. Sharpe 1.31 vs 0.91 risk-parity baseline |
| Project | What It Does |
|---|---|
| causal-inference-toolkit | Full causal pipeline — propensity matching, doubly-robust estimation, IV, DiD, uplift modeling, A/B test analysis (CUPED, sequential testing) with DoWhy + EconML |
| Project | What It Does |
|---|---|
| recommendation-system | Hybrid recommender — ALS + content-based + two-tower neural retrieval with FAISS ANN. Re-ranking with popularity debiasing. NDCG@10 0.378 on MovieLens-25M |
| Project | What It Does |
|---|---|
| fraud-detection-engine | Real-time fraud detection — XGBoost + Isolation Forest + LSTM autoencoder ensemble. Sub-100ms inference, SHAP explainability, streaming simulation. AUPRC 0.89 |
| Project | What It Does |
|---|---|
| market-sentiment-nlp | Stock sentiment analysis — FinBERT + traditional NLP for alpha signal generation with walk-forward backtest |
| llm-bi-assistant | LLM-powered BI assistant — natural language to SQL query generation with automatic visualization using Claude API |
| sports-scout-rag | RAG scouting assistant — LangChain + ChromaDB + Claude for natural-language queries over structured sports data |
| Project | What It Does |
|---|---|
| mlops-monitor | Automated ML monitoring — Prefect orchestration + MLflow tracking + Evidently drift detection + Grafana dashboards |
| mlops-pipeline | Production ML pipeline — MLflow experiment tracking, FastAPI serving, Docker containerization, CI/CD |
| realtime-feature-platform | Feature store — point-in-time joins, sliding-window streaming aggregates, online/offline parity monitoring. Redis + DuckDB + FastAPI |
| Project | What It Does |
|---|---|
| linewatch | Real-time line movement tracker — scrapes opening/closing odds, flags sharp action and steam moves across books |
| draft-value-model | NBA draft value model — career WAR prediction from combine + college stats with trade surplus analyzer |
| injury-risk-predictor | NBA injury risk — survival analysis + LightGBM for return-to-play and load management |
| nba-win-probability | Real-time NBA win probability — XGBoost + SHAP + FastAPI + Streamlit |
| nfl-draft-analytics | NFL draft pick value model — survival analysis + regression on historical career outcomes |
| nba-player-analytics | NBA player performance analytics — XGBoost, Random Forest, Neural Networks with Plotly dashboards |
| Project | What It Does |
|---|---|
| time-series-forecasting-suite | Forecasting benchmark — ARIMA, Prophet, LSTM across retail, energy, and web traffic domains |
| customer-churn-predictor | End-to-end churn prediction — ensemble models + SHAP explainability + Streamlit dashboard |
| Project | What It Does |
|---|---|
| enterpriseRevenueEngine | Enterprise revenue engine — end-to-end BI + AI for revenue forecasting and attribution |
| globalSuperstore | Global retail analytics — Power BI dashboards + SQL analysis on international superstore data |
| Project | What It Does |
|---|---|
| archetypingClustering | NBA player archetypes — K-Means clustering on 2023-24 season performance profiles |
| gravityInfluence | Offensive gravity analysis — measuring off-ball influence on spacing and shot quality |
| creationGrade | Shot creation grading — difficulty-adjusted shot generation metric |
| momentumTrend | In-game momentum — run detection and swing-state analysis |
| BasketBallLogic | Basketball analytics — AI-driven game analysis and strategic insights |
| Domain | Tools |
|---|---|
| ML / DL | PyTorch · XGBoost · LightGBM · CatBoost · scikit-learn · Stable-Baselines3 |
| CV | YOLOv8 · OpenCV · SIFT · EasyOCR · OSNet · decord (NVDEC) |
| NLP | LangChain · FinBERT · ChromaDB · Claude API · RAG |
| Causal / Stats | DoWhy · EconML · statsmodels · scipy |
| Data | PostgreSQL · DuckDB · Redis · pandas · Polars · BigQuery · dbt |
| Serving | FastAPI · Streamlit · Next.js · Docker · WebSocket |
| MLOps | MLflow · Prefect · Evidently · Grafana · GitHub Actions |
| Infra | RunPod GPU · FAISS · Kafka · B2 Cloud |
| Languages | Python · SQL · JavaScript · bash |
- Walk-forward, purged, always. K-fold on time-ordered data is a correctness bug. Train on t-1, evaluate on t, purge the overlap.
- Baselines first. Every model has a cheap baseline it must beat. The delta is the headline, not the number.
- CLV over ROI. Realized ROI on small samples is noisy. Closing-line value is approximately unbiased and converges 5x faster.
- Calibration ≠ accuracy. Reliability diagrams and ECE on every probabilistic model. Miscalibrated models can't be Kelly-sized safely.
- Ship the bug list. The STL model R²=0.09 isn't humility — it's a spec for where the model must not be trusted.

