Neel Shah neeljshah

Neel J. Shah

ML Engineer · Computer Vision · Probabilistic Modeling · Sports Quant

I build end-to-end ML systems — from raw unstructured data through feature engineering, model training, calibration, and production serving. My work spans computer vision, NLP, reinforcement learning, causal inference, recommendation systems, time-series forecasting, and MLOps. Current focus: extracting spatial features from broadcast video that don't exist in any public dataset and pricing them against live sports markets. Open to any role in ML, Computer Vision, Quantitative Researcher, and/or Sports Analytics.

Flagship — CourtVision

court-vision — Possession-level NBA simulator. Broadcast video in, fractional-Kelly-sized +EV positions out.

Broadcast Video -> YOLOv8n detection -> SIFT homography -> Kalman+Hungarian tracking
  -> OSNet re-ID -> EasyOCR -> EventDetector -> CV Features (defender_distance,
    spacing_score, legs_fatigue) -> 75-Model ML Stack -> 10K Monte Carlo
  -> Fractional Kelly + Ledoit-Wolf correlation -> CLV benchmark vs Pinnacle close

75 trained models · 7 prop models · R² 0.47 pts · +14 bps CLV vs Pinnacle close · 960+ tests

Three CV features carry 31% of SHAP mass in the points model — these don't exist in any public NBA dataset. Walk-forward season-purged evaluation, Shin-devigged closing lines, conformal prediction intervals on every bet.

What I Build

Computer Vision & Tracking

Project	What It Does
court-vision	Full CV+ML NBA pipeline — YOLOv8, homography, multi-object tracking, re-ID, event detection. 75 models, FastAPI + Next.js serving
game-film-analyzer	Automated play breakdown — YOLOv8 + ByteTrack + court homography + FastAPI
sports-vision-tracker	Multi-object player tracking with Kalman filters, court homography, real-time analytics
deep-learning-cv	Deep learning from scratch — NumPy backprop implementation through PyTorch CNNs and ResNet/EfficientNet transfer learning

Probabilistic Modeling & Calibration

Project	What It Does
calibcraft	Platt / isotonic / beta calibration with reliability diagrams, ECE, MCE. Sklearn-compatible
kellycorr	Correlated Kelly criterion — full-covariance fractional Kelly sizing for multi-leg portfolios
clvtrack	Closing-line value tracker — benchmarks predictions against Pinnacle close to measure edge decay
walkforge	Walk-forward + purged-K-fold backtester for ML models. Pandas-in, metrics-out

Reinforcement Learning & Optimization

Project	What It Does
rl-portfolio-optimizer	PPO/SAC agents for multi-asset portfolio rebalancing — Gymnasium env with transaction costs, HMM regime detection, walk-forward backtest. Sharpe 1.31 vs 0.91 risk-parity baseline

Causal Inference & Experimentation

Project	What It Does
causal-inference-toolkit	Full causal pipeline — propensity matching, doubly-robust estimation, IV, DiD, uplift modeling, A/B test analysis (CUPED, sequential testing) with DoWhy + EconML

Recommendation Systems

Project	What It Does
recommendation-system	Hybrid recommender — ALS + content-based + two-tower neural retrieval with FAISS ANN. Re-ranking with popularity debiasing. NDCG@10 0.378 on MovieLens-25M

Fraud & Anomaly Detection

Project	What It Does
fraud-detection-engine	Real-time fraud detection — XGBoost + Isolation Forest + LSTM autoencoder ensemble. Sub-100ms inference, SHAP explainability, streaming simulation. AUPRC 0.89

NLP & Language Models

Project	What It Does
market-sentiment-nlp	Stock sentiment analysis — FinBERT + traditional NLP for alpha signal generation with walk-forward backtest
llm-bi-assistant	LLM-powered BI assistant — natural language to SQL query generation with automatic visualization using Claude API
sports-scout-rag	RAG scouting assistant — LangChain + ChromaDB + Claude for natural-language queries over structured sports data

MLOps & Infrastructure

Project	What It Does
mlops-monitor	Automated ML monitoring — Prefect orchestration + MLflow tracking + Evidently drift detection + Grafana dashboards
mlops-pipeline	Production ML pipeline — MLflow experiment tracking, FastAPI serving, Docker containerization, CI/CD
realtime-feature-platform	Feature store — point-in-time joins, sliding-window streaming aggregates, online/offline parity monitoring. Redis + DuckDB + FastAPI

Sports Analytics

Project	What It Does
linewatch	Real-time line movement tracker — scrapes opening/closing odds, flags sharp action and steam moves across books
draft-value-model	NBA draft value model — career WAR prediction from combine + college stats with trade surplus analyzer
injury-risk-predictor	NBA injury risk — survival analysis + LightGBM for return-to-play and load management
nba-win-probability	Real-time NBA win probability — XGBoost + SHAP + FastAPI + Streamlit
nfl-draft-analytics	NFL draft pick value model — survival analysis + regression on historical career outcomes
nba-player-analytics	NBA player performance analytics — XGBoost, Random Forest, Neural Networks with Plotly dashboards

Time Series & Forecasting

Project	What It Does
time-series-forecasting-suite	Forecasting benchmark — ARIMA, Prophet, LSTM across retail, energy, and web traffic domains
customer-churn-predictor	End-to-end churn prediction — ensemble models + SHAP explainability + Streamlit dashboard

Data Analytics & BI

Project	What It Does
enterpriseRevenueEngine	Enterprise revenue engine — end-to-end BI + AI for revenue forecasting and attribution
globalSuperstore	Global retail analytics — Power BI dashboards + SQL analysis on international superstore data

Original NBA Research

Project	What It Does
archetypingClustering	NBA player archetypes — K-Means clustering on 2023-24 season performance profiles
gravityInfluence	Offensive gravity analysis — measuring off-ball influence on spacing and shot quality
creationGrade	Shot creation grading — difficulty-adjusted shot generation metric
momentumTrend	In-game momentum — run detection and swing-state analysis
BasketBallLogic	Basketball analytics — AI-driven game analysis and strategic insights

Technical Stack

Domain	Tools
ML / DL	PyTorch · XGBoost · LightGBM · CatBoost · scikit-learn · Stable-Baselines3
CV	YOLOv8 · OpenCV · SIFT · EasyOCR · OSNet · decord (NVDEC)
NLP	LangChain · FinBERT · ChromaDB · Claude API · RAG
Causal / Stats	DoWhy · EconML · statsmodels · scipy
Data	PostgreSQL · DuckDB · Redis · pandas · Polars · BigQuery · dbt
Serving	FastAPI · Streamlit · Next.js · Docker · WebSocket
MLOps	MLflow · Prefect · Evidently · Grafana · GitHub Actions
Infra	RunPod GPU · FAISS · Kafka · B2 Cloud
Languages	Python · SQL · JavaScript · bash

Research Principles

Walk-forward, purged, always. K-fold on time-ordered data is a correctness bug. Train on t-1, evaluate on t, purge the overlap.
Baselines first. Every model has a cheap baseline it must beat. The delta is the headline, not the number.
CLV over ROI. Realized ROI on small samples is noisy. Closing-line value is approximately unbiased and converges 5x faster.
Calibration ≠ accuracy. Reliability diagrams and ECE on every probabilistic model. Miscalibrated models can't be Kelly-sized safely.
Ship the bug list. The STL model R²=0.09 isn't humility — it's a spec for where the model must not be trusted.

neeljshah22@gmail.com · LinkedIn · Portfolio

Provide feedback

Saved searches

Use saved searches to filter your results more quickly