Skip to content

Arch222/Addition_Final

Repository files navigation

Handwritten Strict Transformer Adder (<100 params)

This implementation is:

  • decoder-only transformer
  • no training
  • autoregressive (argmax of model logits at each step)
  • no symbolic/carry solver branch at inference
  • calibrated for 10-digit + 10-digit addition

Parameter budget

  • Counted parameters (nn.Parameter): 22
  • Trainable parameters: 0
  • Stored weight buffers: 0

Architecture

  • 1 decoder layer
  • hidden size = 3
  • attention heads = 4
  • KV heads = 1
  • head dim = 2
  • MLP hidden = 4
  • vocab size = 10 (digit tokens only)

The compressed handwritten design follows the reference-style setup:

  • large constant embedding channel for stable RMSNorm
  • RoPE offset-targeted queries
  • attention extracts previous/current aligned digits
  • MLP implements carry/overflow logic via thresholded linear pieces
  • tied embedding decode produces digit logits

Prompt / output format

Prompt tokens:

[0] + reverse(a_10_digits) + [0] + [0] + reverse(b_10_digits) + [0]

Generated tokens:

11 reversed sum digits (fixed length).

Generate held-out set

python generate_test_cases.py --n-digits 10 --size 100000 --seed 12345 --out data/heldout_autoreg_10digit.jsonl

Evaluate

python evaluate.py --cases data/heldout_autoreg_10digit.jsonl --n-digits 10 --batch-size 2048

Observed:

  • total_parameters=22
  • accuracy=1.000000 on 100000 held-out cases

Quick stress

python stress_boundaries.py --digit-sizes 10 --cases-per-size 2000 --batch-size 1024

n_digits > 10 is intentionally unsupported by this handwritten weight set.

About

New submission (only including final verifier code). Authored entirely with Codex.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages