Skip to content
View neeljshah's full-sized avatar

Block or report neeljshah

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
neeljshah/README.md

Neel J. Shah

ML Engineer · Computer Vision · Probabilistic Modeling · Sports Quant

Email Portfolio LinkedIn


I build end-to-end ML systems — from raw unstructured data through feature engineering, model training, calibration, and production serving. My work spans computer vision, NLP, reinforcement learning, causal inference, recommendation systems, time-series forecasting, and MLOps. Current focus: extracting spatial features from broadcast video that don't exist in any public dataset and pricing them against live sports markets. Open to any role in ML, Computer Vision, Quantitative Researcher, and/or Sports Analytics.


Flagship — CourtVision

court-vision — Possession-level NBA simulator. Broadcast video in, fractional-Kelly-sized +EV positions out.

Broadcast Video -> YOLOv8n detection -> SIFT homography -> Kalman+Hungarian tracking
  -> OSNet re-ID -> EasyOCR -> EventDetector -> CV Features (defender_distance,
    spacing_score, legs_fatigue) -> 75-Model ML Stack -> 10K Monte Carlo
  -> Fractional Kelly + Ledoit-Wolf correlation -> CLV benchmark vs Pinnacle close

75 trained models · 7 prop models · R² 0.47 pts · +14 bps CLV vs Pinnacle close · 960+ tests

Three CV features carry 31% of SHAP mass in the points model — these don't exist in any public NBA dataset. Walk-forward season-purged evaluation, Shin-devigged closing lines, conformal prediction intervals on every bet.


What I Build

Computer Vision & Tracking

Project What It Does
court-vision Full CV+ML NBA pipeline — YOLOv8, homography, multi-object tracking, re-ID, event detection. 75 models, FastAPI + Next.js serving
game-film-analyzer Automated play breakdown — YOLOv8 + ByteTrack + court homography + FastAPI
sports-vision-tracker Multi-object player tracking with Kalman filters, court homography, real-time analytics
deep-learning-cv Deep learning from scratch — NumPy backprop implementation through PyTorch CNNs and ResNet/EfficientNet transfer learning

Probabilistic Modeling & Calibration

Project What It Does
calibcraft Platt / isotonic / beta calibration with reliability diagrams, ECE, MCE. Sklearn-compatible
kellycorr Correlated Kelly criterion — full-covariance fractional Kelly sizing for multi-leg portfolios
clvtrack Closing-line value tracker — benchmarks predictions against Pinnacle close to measure edge decay
walkforge Walk-forward + purged-K-fold backtester for ML models. Pandas-in, metrics-out

Reinforcement Learning & Optimization

Project What It Does
rl-portfolio-optimizer PPO/SAC agents for multi-asset portfolio rebalancing — Gymnasium env with transaction costs, HMM regime detection, walk-forward backtest. Sharpe 1.31 vs 0.91 risk-parity baseline

Causal Inference & Experimentation

Project What It Does
causal-inference-toolkit Full causal pipeline — propensity matching, doubly-robust estimation, IV, DiD, uplift modeling, A/B test analysis (CUPED, sequential testing) with DoWhy + EconML

Recommendation Systems

Project What It Does
recommendation-system Hybrid recommender — ALS + content-based + two-tower neural retrieval with FAISS ANN. Re-ranking with popularity debiasing. NDCG@10 0.378 on MovieLens-25M

Fraud & Anomaly Detection

Project What It Does
fraud-detection-engine Real-time fraud detection — XGBoost + Isolation Forest + LSTM autoencoder ensemble. Sub-100ms inference, SHAP explainability, streaming simulation. AUPRC 0.89

NLP & Language Models

Project What It Does
market-sentiment-nlp Stock sentiment analysis — FinBERT + traditional NLP for alpha signal generation with walk-forward backtest
llm-bi-assistant LLM-powered BI assistant — natural language to SQL query generation with automatic visualization using Claude API
sports-scout-rag RAG scouting assistant — LangChain + ChromaDB + Claude for natural-language queries over structured sports data

MLOps & Infrastructure

Project What It Does
mlops-monitor Automated ML monitoring — Prefect orchestration + MLflow tracking + Evidently drift detection + Grafana dashboards
mlops-pipeline Production ML pipeline — MLflow experiment tracking, FastAPI serving, Docker containerization, CI/CD
realtime-feature-platform Feature store — point-in-time joins, sliding-window streaming aggregates, online/offline parity monitoring. Redis + DuckDB + FastAPI

Sports Analytics

Project What It Does
linewatch Real-time line movement tracker — scrapes opening/closing odds, flags sharp action and steam moves across books
draft-value-model NBA draft value model — career WAR prediction from combine + college stats with trade surplus analyzer
injury-risk-predictor NBA injury risk — survival analysis + LightGBM for return-to-play and load management
nba-win-probability Real-time NBA win probability — XGBoost + SHAP + FastAPI + Streamlit
nfl-draft-analytics NFL draft pick value model — survival analysis + regression on historical career outcomes
nba-player-analytics NBA player performance analytics — XGBoost, Random Forest, Neural Networks with Plotly dashboards

Time Series & Forecasting

Project What It Does
time-series-forecasting-suite Forecasting benchmark — ARIMA, Prophet, LSTM across retail, energy, and web traffic domains
customer-churn-predictor End-to-end churn prediction — ensemble models + SHAP explainability + Streamlit dashboard

Data Analytics & BI

Project What It Does
enterpriseRevenueEngine Enterprise revenue engine — end-to-end BI + AI for revenue forecasting and attribution
globalSuperstore Global retail analytics — Power BI dashboards + SQL analysis on international superstore data

Original NBA Research

Project What It Does
archetypingClustering NBA player archetypes — K-Means clustering on 2023-24 season performance profiles
gravityInfluence Offensive gravity analysis — measuring off-ball influence on spacing and shot quality
creationGrade Shot creation grading — difficulty-adjusted shot generation metric
momentumTrend In-game momentum — run detection and swing-state analysis
BasketBallLogic Basketball analytics — AI-driven game analysis and strategic insights

Technical Stack

Domain Tools
ML / DL PyTorch · XGBoost · LightGBM · CatBoost · scikit-learn · Stable-Baselines3
CV YOLOv8 · OpenCV · SIFT · EasyOCR · OSNet · decord (NVDEC)
NLP LangChain · FinBERT · ChromaDB · Claude API · RAG
Causal / Stats DoWhy · EconML · statsmodels · scipy
Data PostgreSQL · DuckDB · Redis · pandas · Polars · BigQuery · dbt
Serving FastAPI · Streamlit · Next.js · Docker · WebSocket
MLOps MLflow · Prefect · Evidently · Grafana · GitHub Actions
Infra RunPod GPU · FAISS · Kafka · B2 Cloud
Languages Python · SQL · JavaScript · bash

Research Principles

  • Walk-forward, purged, always. K-fold on time-ordered data is a correctness bug. Train on t-1, evaluate on t, purge the overlap.
  • Baselines first. Every model has a cheap baseline it must beat. The delta is the headline, not the number.
  • CLV over ROI. Realized ROI on small samples is noisy. Closing-line value is approximately unbiased and converges 5x faster.
  • Calibration ≠ accuracy. Reliability diagrams and ECE on every probabilistic model. Miscalibrated models can't be Kelly-sized safely.
  • Ship the bug list. The STL model R²=0.09 isn't humility — it's a spec for where the model must not be trusted.

Popular repositories Loading

  1. neeljshah neeljshah Public

    My Portfolio

    1

  2. court-vision court-vision Public

    End-to-end NBA analytics and prediction system — CV player tracking, ML models, betting edge detection, analytics dashboards, AI chat

    Python 1

  3. breastCancer breastCancer Public

    Breast Cancer Data Set Wisconsin

  4. housingPrice housingPrice Public

    Ames Housing Price DataSet

  5. onlineRetail onlineRetail Public

    Data Set From Online Retailing in Europe

  6. globalSuperstore globalSuperstore Public

    Global retail analytics — Power BI dashboards + SQL analysis on international superstore dataset.