A minimal GPT implementation in pure Go with no external dependencies.
Inspired by Karpathy's minGPT — the most atomic way to train and run inference for a GPT, ported from Python to idiomatic Go.
- Autograd engine — reverse-mode automatic differentiation over scalar values
- Character-level tokenizer — maps characters to token IDs with BOS support
- Transformer model — GPT-2 architecture with multi-head attention, RMSNorm, and ReLU
- Adam optimizer — with bias correction and linear learning rate decay
- Training + Inference — trains on a names dataset, generates new hallucinated names
├── autograd/ # Scalar autograd engine (Value type + backprop)
├── tokenizer/ # Character-level tokenizer
├── model/
│ ├── layers.go # Linear, Softmax, RMSNorm primitives
│ ├── gpt.go # GPT forward pass with KV cache
│ └── state.go # Model config and parameter initialization
├── training/ # Adam optimizer
├── data/ # Dataset loader (auto-downloads names.txt)
└── main.go # Training loop + inference
go build -o go-gpt .
./go-gptThe program will:
- Download the names dataset (32K names) if not present
- Train a tiny GPT (1 layer, 16-dim embeddings, 4 heads) for 1000 steps
- Generate 20 new hallucinated names
| Parameter | Value |
|---|---|
| Layers | 1 |
| Embedding dim | 16 |
| Attention heads | 4 |
| Context length | 16 |
| Vocab size | 27 (a-z + BOS) |
This is deliberately tiny — the goal is clarity, not performance. Everything runs on scalar autograd, so training is slow but the code is readable.
- LayerNorm → RMSNorm
- GELU → ReLU
- No biases
- Character-level tokenization (no BPE)
- Scalar autograd (no tensor ops)