parameter-golf-local is a distilled spinout of the main Parameter Golf challenge for local hardware.
The goal is the same in spirit: train the best language model you can under a strict artifact budget, then score it by tokenizer-agnostic compression on held-out FineWeb data. The difference is that this version is scaled for hardware people can actually own.
| Setting | Local Track |
|---|---|
| Artifact cap | 2,000,000 bytes total |
| Reference training device | 1x RTX 4090 |
| Reference wallclock | 30 minutes |
| Official record lane | single-device CUDA |
| Starter Apple lane | MLX open-hardware / non-record |
| Training tokenizer | SentencePiece 1024 |
| Starter model | 6 layers, 192 dim, tied embeddings |
| Training sequence length | 512 |
| Default evaluation sequence length | 1024 |
| Default evaluation split | first 4,194,304 validation tokens |
| Default training download | first 2 train shards |
The local track shrinks four things together:
- model size
- training token throughput
- evaluation cost
- wallclock budget
That keeps the contest meaningfully about modeling and compression instead of turning it into a pure kernel race.
The repo now publishes a conservative starter baseline instead of leaving the README blank:
| Label | Score | Notes |
|---|---|---|
| Published baseline | 2.3027 |
Conservative public baseline |
| Measured Apple run | 1.6448 |
30 min MLX run on an Apple M4 Pro, using 1 local train shard |
How the published baseline was chosen:
- the measured Apple run reached
final_sliding_window_exact val_bpb = 1.64477742 - per your requested policy, the public baseline is that score with
40%added - published baseline score =
1.64477742 * 1.4 = 2.30268839, shown as2.3027
Important caveat:
- that measured Apple run was a local reference run, not a valid record submission
- it finished with
total_submission_size_quant_zlib = 2,322,075bytes, which is over the2,000,000byte cap - until a tuned under-cap
4090reference run is added, the README baseline should be read as a conservative starter target, not as the official best valid baseline
git clone <your fork or repo URL>
cd parameter-golf-local
python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r requirements-cuda.txt
python3 data/cached_challenge_fineweb.py --variant sp1024 --train-shards 2
bash scripts/run_cuda_baseline.sh configs/local_4090_baseline.envThat baseline targets the official local record lane: a single CUDA device with a 30 minute budget.
git clone <your fork or repo URL>
cd parameter-golf-local
python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r requirements-mlx.txt
python3 data/cached_challenge_fineweb.py --variant sp1024 --train-shards 2
bash scripts/run_mlx_baseline.sh configs/local_apple_mlx.envThe MLX path uses the same smaller model family and validation cap, but it is best treated as an open-hardware lane rather than a directly comparable record lane. Apple GPU generations vary too much to make them the primary reference target.
- The counted artifact is
code bytes + compressed model bytes. - The cap is decimal
2,000,000bytes, not MiB. - Training must stay within the declared wallclock budget for the lane you are claiming.
- Evaluation must use only the published validation split. No network calls or extra downloads are allowed during evaluation.
- The official record lane is
single-device CUDAon hardware comparable to an RTX4090. - Apple / MLX submissions are welcome, but should be labeled as open-hardware or non-record unless they match the reference lane.
More detail is in docs/track_spec.md.
Add a new folder under records/track_30min_2mb_local. Each submission folder should contain:
README.mdsubmission.jsontrain.logtrain_gpt.py
The included helper checks that shape:
bash scripts/check_submission_folder.sh records/track_30min_2mb_local/<your_run_folder>A starter JSON template lives at docs/submission_template.json.
- configs/local_4090_baseline.env: official single-device CUDA baseline
- configs/local_apple_mlx.env: Apple Silicon MLX baseline
- configs/local_apple_smoke.env: short MLX smoke run
This repo intentionally keeps the same core data format and scoring philosophy as the larger challenge so ideas transfer cleanly back and forth. The local track is not meant to be toy-sized. It is meant to be cheap enough that people can iterate on their own hardware and still learn real lessons about architecture, optimization, and compression.