Skip to content

vpuri3/mlutils.py

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mlutils.py

A standalone development repo for mlutils (PyTorch training utilities) plus a minimal reference app in project/.

mlutils in this repo is synced from ~/.julia/dev/FLARE-dev.py/mlutils and validated here as a standalone package/workflow.

What Is In This Repo

  • mlutils/: training framework (trainer, callbacks, utils, schedules, models, EMA)
  • project/: minimal runnable example app using mlutils
  • tests/: unit + integration tests (CPU and optional GPU smoke tests)

Repository Layout

mlutils.py/
├── mlutils/
│   ├── trainer.py
│   ├── callbacks.py
│   ├── utils.py
│   ├── schedule.py
│   ├── models.py
│   └── ema.py
├── project/
│   ├── __main__.py
│   ├── callbacks.py
│   ├── utils.py
│   ├── datasets/
│   │   ├── dummy.py
│   │   └── utils.py
│   └── models/
│       └── transformer.py
├── tests/
├── pyproject.toml
└── scripts/

Environment Setup

This repo uses Python 3.11+ and works well with uv.

uv venv
uv sync
source .venv/bin/activate

If .venv already exists:

source .venv/bin/activate

Quick Start

1. Inspect CLI

python -m project --help

2. Train (new experiment)

python -m project \
  --train true \
  --dataset dummy \
  --epochs 10 \
  --batch_size 64 \
  --exp_name demo_run

3. Restart from latest checkpoint

python -m project \
  --restart true \
  --exp_name demo_run

4. Evaluate latest checkpoint

python -m project \
  --evaluate true \
  --exp_name demo_run

Output Layout

Experiment outputs are written to:

  • out/project/<exp_name>/config.yaml
  • out/project/<exp_name>/ckptXX/ (periodic checkpoints)
  • out/project/<exp_name>/eval/ (evaluation artifacts)

Common artifacts:

  • model.pt
  • stats.json
  • model_stats.json
  • losses.png, grad_norm.png, learning_rate.png

Example output for --exp_name demo_run:

out/
└── project/
    └── demo_run/
        ├── config.yaml
        ├── model_stats.json
        ├── losses.png
        ├── grad_norm.png
        ├── learning_rate.png
        ├── ckpt00/
        │   ├── model.pt
        │   ├── stats.json
        │   ├── model_stats.json
        │   ├── losses.png
        │   ├── grad_norm.png
        │   └── learning_rate.png
        ├── ckpt01/
        │   └── ...
        ├── ckpt02/
        │   └── ...
        └── eval/
            ├── model.pt
            ├── stats.json
            ├── model_stats.json
            ├── losses.png
            ├── grad_norm.png
            └── learning_rate.png

Dummy Dataset

project/datasets/dummy.py creates a synthetic regression dataset if files are missing:

  • Inputs: fixed 2D grid points, shape [N, 1024, 2]
  • Targets: sin(pi*x) * sin(pi*y), shape [N, 1024, 1]

load_dataset("dummy", ...) returns an 80/20 train/test split plus metadata (in_dim=2, out_dim=1, normalizers).

Testing

Run all tests:

pytest -q

Run only integration tests:

pytest -q -m integration

Notes:

  • GPU smoke tests auto-skip when CUDA is unavailable.
  • CPU smoke/eval flow tests always run.

Syncing mlutils From FLARE-dev

This repo includes a skill-oriented sync workflow at:

  • ~/.codex/skills/flare-mlutils-sync/

Core scripts:

  • sync_mlutils_from_flare.sh: sync mlutils/ from FLARE-dev
  • validate_mlutils_sync.sh: run test + smoke validation

License

MIT (see LICENSE).

About

ML project template in pytorch

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors