A lightweight GPT-style language model implementation in PyTorch for learning and experimentation with transformer architectures. This project trains a small-scale GPT model on simple English phrases to understand language generation fundamentals.
This project implements a miniature GPT (Generative Pre-trained Transformer) model from scratch using PyTorch. It's designed for educational purposes and experimentation with transformer-based language models.
- Custom GPT Architecture: Implements a small-scale GPT with configurable parameters
- Multi-Head Self-Attention: Full implementation of transformer attention mechanisms
- Layer Normalization & GELU: Modern architectural components
- Training Pipeline: Complete training loop with data loading and optimization
- Text Generation: Generate text based on trained model checkpoints
- Jupyter Notebooks: Interactive notebooks for training and inference
The model uses the following configuration (defined in picogpt/config.json):
{
"vocab_size": 50257,
"emb_dim": 256,
"context_length": 128,
"n_heads": 4,
"n_layers": 4,
"drop_rate": 0.1,
"qkv_bias": true
}- Token & Position Embeddings: Learned embeddings for input tokens and positions
- Transformer Blocks: 4 layers with multi-head self-attention
- Feed-Forward Networks: Position-wise FFN with GELU activation
- Layer Normalization: Pre-normalization for stable training
- Dropout: Regularization to prevent overfitting
picogpt-python/
├── model.ipynb # Main training notebook
├── model_load.ipynb # Model loading and inference notebook
├── picogpt.pt # Trained model checkpoint
├── picogpt/
│ └── config.json # Model configuration
├── simple_english_phrases.txt # Training data
├── LICENSE # License file
└── README.md # This file
- Python 3.8+
- PyTorch
- tiktoken (OpenAI's tokenizer)
- tqdm (for progress bars)
- Jupyter Notebook
# Clone the repository
git clone https://github.com/nazimboudeffa/picogpt-python.git
cd picogpt-python
# Install dependencies
pip install torch tiktoken tqdm jupyterOpen and run model.ipynb in Jupyter Notebook:
jupyter notebook model.ipynbThe notebook includes:
- Model architecture definition
- Data loading from
simple_english_phrases.txt - Training loop with loss tracking
- Model checkpoint saving
Open model_load.ipynb to load and generate text:
jupyter notebook model_load.ipynbThis notebook demonstrates:
- Loading saved model weights (
picogpt.pt) - Text generation with different parameters
- Temperature-based sampling
The project uses simple_english_phrases.txt containing simple sentence patterns with:
- Basic subjects (The cat, My dog, The teacher, etc.)
- Common verbs (eats, runs, jumps, sleeps, etc.)
- Simple objects and locations
This simplified dataset helps the model learn basic grammatical structures.
Modify picogpt/config.json to experiment with different model sizes:
vocab_size: Size of the vocabulary (default: 50257 for GPT-2 tokenizer)emb_dim: Embedding dimension (increase for larger models)context_length: Maximum sequence lengthn_heads: Number of attention headsn_layers: Number of transformer layersdrop_rate: Dropout probabilityqkv_bias: Whether to use bias in attention projections
import torch
from model import GPTModel, generate
import tiktoken
# Load configuration
cfg = {...} # Your config
# Initialize model
model = GPTModel(cfg)
model.load_state_dict(torch.load('picogpt.pt'))
model.eval()
# Generate text
tokenizer = tiktoken.get_encoding("gpt2")
prompt = "The cat"
generated_text = generate(model, tokenizer, prompt, max_new_tokens=50, temperature=0.8)
print(generated_text)-
Incoherent outputs:
- Train for more epochs (20+ recommended)
- Lower the learning rate (1e-4)
- Increase training data size
-
High loss:
- Ensure loss decreases progressively (from ~7 to ~2)
- Check data preprocessing
- Verify model architecture
-
Out of memory:
- Reduce
context_length - Decrease
emb_dimorn_layers - Use smaller batch sizes
- Reduce
- Add learning rate scheduling
- Implement gradient clipping
- Add validation set and metrics
- Support for custom tokenizers
- Multi-language support (French, etc.)
- Beam search for generation
- Model quantization for deployment
This project is open source and available under the MIT License.
Contributions are welcome! Feel free to:
- Report bugs
- Suggest features
- Submit pull requests
- Attention Is All You Need
- Language Models are Unsupervised Multitask Learners (GPT-2)
- picoGPT - Inspiration for this implementation
Nazim Boudeffa
- GitHub: @nazimboudeffa
- Repository: picogpt-python
Built with ❤️ for learning and experimenting with transformer models.