Skip to content

GLDRoger/parameter-golf-local

Repository files navigation

Parameter Golf Local

parameter-golf-local is a distilled spinout of the main Parameter Golf challenge for local hardware.

The goal is the same in spirit: train the best language model you can under a strict artifact budget, then score it by tokenizer-agnostic compression on held-out FineWeb data. The difference is that this version is scaled for hardware people can actually own.

Track Summary

Setting Local Track
Artifact cap 2,000,000 bytes total
Reference training device 1x RTX 4090
Reference wallclock 30 minutes
Official record lane single-device CUDA
Starter Apple lane MLX open-hardware / non-record
Training tokenizer SentencePiece 1024
Starter model 6 layers, 192 dim, tied embeddings
Training sequence length 512
Default evaluation sequence length 1024
Default evaluation split first 4,194,304 validation tokens
Default training download first 2 train shards

The local track shrinks four things together:

  • model size
  • training token throughput
  • evaluation cost
  • wallclock budget

That keeps the contest meaningfully about modeling and compression instead of turning it into a pure kernel race.

Provisional Baseline

The repo now publishes a conservative starter baseline instead of leaving the README blank:

Label Score Notes
Published baseline 2.3027 Conservative public baseline
Measured Apple run 1.6448 30 min MLX run on an Apple M4 Pro, using 1 local train shard

How the published baseline was chosen:

  • the measured Apple run reached final_sliding_window_exact val_bpb = 1.64477742
  • per your requested policy, the public baseline is that score with 40% added
  • published baseline score = 1.64477742 * 1.4 = 2.30268839, shown as 2.3027

Important caveat:

  • that measured Apple run was a local reference run, not a valid record submission
  • it finished with total_submission_size_quant_zlib = 2,322,075 bytes, which is over the 2,000,000 byte cap
  • until a tuned under-cap 4090 reference run is added, the README baseline should be read as a conservative starter target, not as the official best valid baseline

Quickstart: CUDA / RTX 4090

git clone <your fork or repo URL>
cd parameter-golf-local
python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r requirements-cuda.txt

python3 data/cached_challenge_fineweb.py --variant sp1024 --train-shards 2

bash scripts/run_cuda_baseline.sh configs/local_4090_baseline.env

That baseline targets the official local record lane: a single CUDA device with a 30 minute budget.

Quickstart: Apple Silicon / MLX

git clone <your fork or repo URL>
cd parameter-golf-local
python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r requirements-mlx.txt

python3 data/cached_challenge_fineweb.py --variant sp1024 --train-shards 2

bash scripts/run_mlx_baseline.sh configs/local_apple_mlx.env

The MLX path uses the same smaller model family and validation cap, but it is best treated as an open-hardware lane rather than a directly comparable record lane. Apple GPU generations vary too much to make them the primary reference target.

Rules

  • The counted artifact is code bytes + compressed model bytes.
  • The cap is decimal 2,000,000 bytes, not MiB.
  • Training must stay within the declared wallclock budget for the lane you are claiming.
  • Evaluation must use only the published validation split. No network calls or extra downloads are allowed during evaluation.
  • The official record lane is single-device CUDA on hardware comparable to an RTX 4090.
  • Apple / MLX submissions are welcome, but should be labeled as open-hardware or non-record unless they match the reference lane.

More detail is in docs/track_spec.md.

Submission Process

Add a new folder under records/track_30min_2mb_local. Each submission folder should contain:

  1. README.md
  2. submission.json
  3. train.log
  4. train_gpt.py

The included helper checks that shape:

bash scripts/check_submission_folder.sh records/track_30min_2mb_local/<your_run_folder>

A starter JSON template lives at docs/submission_template.json.

Starter Presets

Design Notes

This repo intentionally keeps the same core data format and scoring philosophy as the larger challenge so ideas transfer cleanly back and forth. The local track is not meant to be toy-sized. It is meant to be cheap enough that people can iterate on their own hardware and still learn real lessons about architecture, optimization, and compression.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors