PicoGPT Python

A lightweight GPT-style language model implementation in PyTorch for learning and experimentation with transformer architectures. This project trains a small-scale GPT model on simple English phrases to understand language generation fundamentals.

🎯 Overview

This project implements a miniature GPT (Generative Pre-trained Transformer) model from scratch using PyTorch. It's designed for educational purposes and experimentation with transformer-based language models.

✨ Features

Custom GPT Architecture: Implements a small-scale GPT with configurable parameters
Multi-Head Self-Attention: Full implementation of transformer attention mechanisms
Layer Normalization & GELU: Modern architectural components
Training Pipeline: Complete training loop with data loading and optimization
Text Generation: Generate text based on trained model checkpoints
Jupyter Notebooks: Interactive notebooks for training and inference

🏗️ Model Architecture

The model uses the following configuration (defined in picogpt/config.json):

{
  "vocab_size": 50257,
  "emb_dim": 256,
  "context_length": 128,
  "n_heads": 4,
  "n_layers": 4,
  "drop_rate": 0.1,
  "qkv_bias": true
}

Architecture Components

Token & Position Embeddings: Learned embeddings for input tokens and positions
Transformer Blocks: 4 layers with multi-head self-attention
Feed-Forward Networks: Position-wise FFN with GELU activation
Layer Normalization: Pre-normalization for stable training
Dropout: Regularization to prevent overfitting

📁 Project Structure

picogpt-python/
├── model.ipynb              # Main training notebook
├── model_load.ipynb         # Model loading and inference notebook
├── picogpt.pt               # Trained model checkpoint
├── picogpt/
│   └── config.json          # Model configuration
├── simple_english_phrases.txt  # Training data
├── LICENSE                  # License file
└── README.md               # This file

🚀 Getting Started

Prerequisites

Python 3.8+
PyTorch
tiktoken (OpenAI's tokenizer)
tqdm (for progress bars)
Jupyter Notebook

Installation

# Clone the repository
git clone https://github.com/nazimboudeffa/picogpt-python.git
cd picogpt-python

# Install dependencies
pip install torch tiktoken tqdm jupyter

Training the Model

Open and run model.ipynb in Jupyter Notebook:

jupyter notebook model.ipynb

The notebook includes:

Model architecture definition
Data loading from simple_english_phrases.txt
Training loop with loss tracking
Model checkpoint saving

Using a Trained Model

Open model_load.ipynb to load and generate text:

jupyter notebook model_load.ipynb

This notebook demonstrates:

Loading saved model weights (picogpt.pt)
Text generation with different parameters
Temperature-based sampling

📊 Training Data

The project uses simple_english_phrases.txt containing simple sentence patterns with:

Basic subjects (The cat, My dog, The teacher, etc.)
Common verbs (eats, runs, jumps, sleeps, etc.)
Simple objects and locations

This simplified dataset helps the model learn basic grammatical structures.

🔧 Configuration

Modify picogpt/config.json to experiment with different model sizes:

vocab_size: Size of the vocabulary (default: 50257 for GPT-2 tokenizer)
emb_dim: Embedding dimension (increase for larger models)
context_length: Maximum sequence length
n_heads: Number of attention heads
n_layers: Number of transformer layers
drop_rate: Dropout probability
qkv_bias: Whether to use bias in attention projections

💡 Usage Example

import torch
from model import GPTModel, generate
import tiktoken

# Load configuration
cfg = {...}  # Your config

# Initialize model
model = GPTModel(cfg)
model.load_state_dict(torch.load('picogpt.pt'))
model.eval()

# Generate text
tokenizer = tiktoken.get_encoding("gpt2")
prompt = "The cat"
generated_text = generate(model, tokenizer, prompt, max_new_tokens=50, temperature=0.8)
print(generated_text)

🐛 Troubleshooting

Common Issues

Incoherent outputs:
- Train for more epochs (20+ recommended)
- Lower the learning rate (1e-4)
- Increase training data size
High loss:
- Ensure loss decreases progressively (from ~7 to ~2)
- Check data preprocessing
- Verify model architecture
Out of memory:
- Reduce context_length
- Decrease emb_dim or n_layers
- Use smaller batch sizes

📈 Future Improvements

Add learning rate scheduling
Implement gradient clipping
Add validation set and metrics
Support for custom tokenizers
Multi-language support (French, etc.)
Beam search for generation
Model quantization for deployment

📝 License

This project is open source and available under the MIT License.

🤝 Contributing

Contributions are welcome! Feel free to:

Report bugs
Suggest features
Submit pull requests

📚 References

Attention Is All You Need
Language Models are Unsupervised Multitask Learners (GPT-2)
picoGPT - Inspiration for this implementation

👤 Author

Nazim Boudeffa

GitHub: @nazimboudeffa
Repository: picogpt-python

Built with ❤️ for learning and experimenting with transformer models.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PicoGPT Python

🎯 Overview

✨ Features

🏗️ Model Architecture

Architecture Components

📁 Project Structure

🚀 Getting Started

Prerequisites

Installation

Training the Model

Using a Trained Model

📊 Training Data

🔧 Configuration

💡 Usage Example

🐛 Troubleshooting

Common Issues

📈 Future Improvements

📝 License

🤝 Contributing

📚 References

👤 Author

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
picogpt		picogpt
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
model.ipynb		model.ipynb
model_load.ipynb		model_load.ipynb
simple_english_phrases.txt		simple_english_phrases.txt

Folders and files

Latest commit

History

Repository files navigation

PicoGPT Python

🎯 Overview

✨ Features

🏗️ Model Architecture

Architecture Components

📁 Project Structure

🚀 Getting Started

Prerequisites

Installation

Training the Model

Using a Trained Model

📊 Training Data

🔧 Configuration

💡 Usage Example

🐛 Troubleshooting

Common Issues

📈 Future Improvements

📝 License

🤝 Contributing

📚 References

👤 Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages