End-to-end applied ML pipelines (scikit-learn + PyTorch) with reproducible training, evaluation, and CLI inference.
This repository provides minimal, production-minded scaffolding for classical and simple deep learning tasks:
- deterministic data handling (seeded splits)
- config-driven training
- metric reporting
- artifactized model saving/loading
- CLI inference
- basic tests
The goal is to demonstrate clean applied ML engineering, not comprehensive MLOps or cloud deployment.
repo/
├─ README.md
├─ requirements.txt
├─ pyproject.toml # optional
├─ src/
│ ├─ data/
│ ├─ models/
│ ├─ configs/
│ ├─ training/
│ ├─ evaluation/
│ ├─ tests/
│ └─ cli/
└─ notebooks/
Conventions:
notebooks/for EDA and prototyping onlysrc/contains reusable training/inference codeconfigs/contains YAML/JSON hyperparameter configstests/contains lightweight unit tests
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txtpython -m src.training.train \
--config configs/regression.yaml \
--output artifacts/regression/Evaluation artifacts (metrics, plots, confusion matrices) are written to the output directory defined at train time.
python -m src.cli.predict \
--model artifacts/regression/model.joblib \
--input samples/input.csv \
--output predictions.csv- seeded splits (NumPy / Torch)
- pinned dependencies
- config-driven hyperparameters
- artifacts stored outside source tree
Lightweight tests verify data utilities, model initialization, and CLI inference. Run:
pytest -q| Component | Status |
|---|---|
| Regression (SKL) | WIP |
| Classification | WIP |
| Torch baseline | WIP |
| CLI inference | WIP |
| Tests | WIP |
No large datasets or sensitive data are committed. Small synthetic or toy CSVs may be used for tests.