Fine-tuning GPT-2, DistilGPT-2, and AWD-LSTM models on the Facebook AI ROCStories dataset (300k story-prompt pairs) for coherent narrative generation.
| Model | Approach | Notebook |
|---|---|---|
| GPT-2 | Causal LM fine-tuning via HuggingFace Transformers | FIne_tuning_GPT2.ipynb |
| DistilGPT-2 | Lighter GPT-2 variant, faster training | Fine_tuning_DistilGPT.ipynb |
| AWD-LSTM | Recurrent baseline (ASGD weight-dropped LSTM) | AWD_LSTM.ipynb |
The ROCStories Corpus from Facebook AI Research:
- 300,000 five-sentence stories
- Each story has an associated cloze-test prompt
- Covers everyday narrative scenarios
Data loading and preprocessing: dataLoading.py · Data Preprocessing.ipynb · Story Generation Data Loading.ipynb
git clone https://github.com/harivilasp/LLM-Story-Generation.git
cd LLM-Story-Generation
pip install torch transformers datasets jupyterRun notebooks in this order:
Data Preprocessing.ipynb— clean and tokenize the datasetFIne_tuning_GPT2.ipynborFine_tuning_DistilGPT.ipynb— train the model- Evaluate output in the same notebook; compare with AWD-LSTM baseline in
AWD_LSTM.ipynb
Human evaluations are logged in Human_Evaluations.xlsx, covering:
- Coherence
- Fluency
- Relevance to prompt
Transformer-based models (GPT-2, DistilGPT-2) consistently outperformed the AWD-LSTM baseline on all three dimensions.
- Python 3.7+
- PyTorch
- HuggingFace
transformersanddatasets - Jupyter
- Facebook AI Research — ROCStories dataset
- HuggingFace for pre-trained model weights and Trainer API