This repository documents my journey of implementing a GPT model from the ground up.
Every step is explained in Jupyter notebooks with code, notes, and experiments, making it a resource for anyone curious about how GPTs really work under the hood.
- micrograd (autograd engine from scratch)
- makemore Part 1 (building a character-level language model)
- Tokenizer and transformer implementation
- Profiling and benchmarking
- Custom Triton kernels (e.g., FlashAttention2)
- Distributed and memory-efficient training
- Scaling experiments
- Data preprocessing and filtering from raw sources
- Alignment methods: supervised finetuning, reinforcement learning, and DPO
Following Andrej Karpathy’s neural net series for inspiration and guidance.
Can’t thank him enough for making this stuff feel fun instead of intimidating.