Parameter Golf Local

parameter-golf-local is a distilled spinout of the main Parameter Golf challenge for local hardware.

The goal is the same in spirit: train the best language model you can under a strict artifact budget, then score it by tokenizer-agnostic compression on held-out FineWeb data. The difference is that this version is scaled for hardware people can actually own.

Track Summary

Setting	Local Track
Artifact cap	`2,000,000` bytes total
Reference training device	`1x RTX 4090`
Reference wallclock	`30 minutes`
Official record lane	single-device CUDA
Starter Apple lane	MLX open-hardware / non-record
Training tokenizer	`SentencePiece 1024`
Starter model	`6 layers`, `192 dim`, tied embeddings
Training sequence length	`512`
Default evaluation sequence length	`1024`
Default evaluation split	first `4,194,304` validation tokens
Default training download	first `2` train shards

The local track shrinks four things together:

model size
training token throughput
evaluation cost
wallclock budget

That keeps the contest meaningfully about modeling and compression instead of turning it into a pure kernel race.

Provisional Baseline

The repo now publishes a conservative starter baseline instead of leaving the README blank:

Label	Score	Notes
Published baseline	`2.3027`	Conservative public baseline
Measured Apple run	`1.6448`	`30 min` MLX run on an Apple `M4 Pro`, using `1` local train shard

How the published baseline was chosen:

the measured Apple run reached final_sliding_window_exact val_bpb = 1.64477742
per your requested policy, the public baseline is that score with 40% added
published baseline score = 1.64477742 * 1.4 = 2.30268839, shown as 2.3027

Important caveat:

that measured Apple run was a local reference run, not a valid record submission
it finished with total_submission_size_quant_zlib = 2,322,075 bytes, which is over the 2,000,000 byte cap
until a tuned under-cap 4090 reference run is added, the README baseline should be read as a conservative starter target, not as the official best valid baseline

Quickstart: CUDA / RTX 4090

git clone <your fork or repo URL>
cd parameter-golf-local
python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r requirements-cuda.txt

python3 data/cached_challenge_fineweb.py --variant sp1024 --train-shards 2

bash scripts/run_cuda_baseline.sh configs/local_4090_baseline.env

That baseline targets the official local record lane: a single CUDA device with a 30 minute budget.

Quickstart: Apple Silicon / MLX

git clone <your fork or repo URL>
cd parameter-golf-local
python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r requirements-mlx.txt

python3 data/cached_challenge_fineweb.py --variant sp1024 --train-shards 2

bash scripts/run_mlx_baseline.sh configs/local_apple_mlx.env

The MLX path uses the same smaller model family and validation cap, but it is best treated as an open-hardware lane rather than a directly comparable record lane. Apple GPU generations vary too much to make them the primary reference target.

Rules

The counted artifact is code bytes + compressed model bytes.
The cap is decimal 2,000,000 bytes, not MiB.
Training must stay within the declared wallclock budget for the lane you are claiming.
Evaluation must use only the published validation split. No network calls or extra downloads are allowed during evaluation.
The official record lane is single-device CUDA on hardware comparable to an RTX 4090.
Apple / MLX submissions are welcome, but should be labeled as open-hardware or non-record unless they match the reference lane.

More detail is in docs/track_spec.md.

Submission Process

Add a new folder under records/track_30min_2mb_local. Each submission folder should contain:

README.md
submission.json
train.log
train_gpt.py

The included helper checks that shape:

bash scripts/check_submission_folder.sh records/track_30min_2mb_local/<your_run_folder>

A starter JSON template lives at docs/submission_template.json.

Starter Presets

configs/local_4090_baseline.env: official single-device CUDA baseline
configs/local_apple_mlx.env: Apple Silicon MLX baseline
configs/local_apple_smoke.env: short MLX smoke run

Design Notes

This repo intentionally keeps the same core data format and scoring philosophy as the larger challenge so ideas transfer cleanly back and forth. The local track is not meant to be toy-sized. It is meant to be cheap enough that people can iterate on their own hardware and still learn real lessons about architecture, optimization, and compression.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parameter Golf Local

Track Summary

Provisional Baseline

Quickstart: CUDA / RTX 4090

Quickstart: Apple Silicon / MLX

Rules

Submission Process

Starter Presets

Design Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
configs		configs
data		data
docs		docs
records/track_30min_2mb_local		records/track_30min_2mb_local
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
THIRD_PARTY_NOTICES.md		THIRD_PARTY_NOTICES.md
requirements-cuda.txt		requirements-cuda.txt
requirements-mlx.txt		requirements-mlx.txt
train_gpt.py		train_gpt.py
train_gpt_mlx.py		train_gpt_mlx.py

Folders and files

Latest commit

History

Repository files navigation

Parameter Golf Local

Track Summary

Provisional Baseline

Quickstart: CUDA / RTX 4090

Quickstart: Apple Silicon / MLX

Rules

Submission Process

Starter Presets

Design Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages