Frédéric Berdoz · Luca A. Lanzendörfer · Kaan Bayraktar · Roger Wattenhofer
Accepted at AIGOV @ AAAI 2026
This repository contains experiments for studying attribution in language models via bidirectional gradient optimization on two main datasets: Wikipedia facts and Project Gutenberg books. It is organized into two main experiment folders:
wikipedia/: Experiments on Wikipedia factual knowledgegutenberg/: Experiments on literary text from Project Gutenberg
All required dependencies are listed in requirements.txt. Install them using:
pip install -r requirements.txtStudy attribution in the context of factual statements.
To first get the data, model and Fisher Information run the following:
wiki_tokenization.py: Get and preprocess the data of wikipedia abstractswiki_model.py: Main training script for Wikipedia language model- Trains GPT-2 architecture from scratch on Wikipedia data
cd wikipedia
python wiki_tokenization.py
python wiki_model.py
python fisher_diag.pyAfterwards generate samples and record loss by performing both gradient ascent and descent.
python wikipedia/wiki_experiments.py## The model name are specified in wiki_experiments during the saving process (These are the optimized models by either gradient ascent or descent corresponding to a certain generated text sample).
## In wiki_experiments.py for each generated sample we save an gradient ascent and descent optimized model. (Command template for loss computation):
python wikipedia/loss_computation.py --finetuned_model_path <model_name> --save_path <results> --mode <finetuned/unlearned>
## Need to run once with --mode unlearned and finetuned for each generated sample:
## Example to get influential training samples for the Ancient Rome case:
python wikipedia/loss_computation.py --finetuned_model_path "ancient_rome" --save_path "ancient_rome" --mode "finetuned"
python wikipedia/loss_computation.py --finetuned_model_path "ancient_rome" --save_path "ancient_rome" --mode "unlearned"
## To get the losses of the base model run with following arguments:
python wikipedia/loss_computation.py --finetuned_model_path "wiki_model" --save_path "losses_bf" --mode "base"python wikipedia/tailpatch.py --save_path=results_aggregatedRun these scripts to extract and preprocess for our Gutenberg Dataset:
concurrent_scraping.py: Parallel data collection from Project Gutenberg. Use following link to get access to the books: https://www.gutenberg.org/cache/epub/feeds/ the pgmarc.xml file (if you want the current gutenberg book collection)gutenberg_book_preprocessing.ipynb: Preprocess data for Gutenberg. Preprocessed booktexts along with metadata used are already provided inselected_dataset_mixed.csv. Run the last cell ofgutenberg_book_preprocessing.ipynbto obtain tokenized dataset for training.untokenize_gutenberg_small.py: Text reconstruction for BM25 retrieval
Training can be done with following script:
# Gutenberg model
cd gutenberg
python gutenberg_training.py --model_path gpt2-scratch-mixed --data_path selected_dataset_mixed.jsonSample Generation and loss computation:
# Generate samples and compute losses for author-specific attribution
cd gutenberg
python samples_gutenberg.pyTo evaluate with the tailpatch score on our samples use the gutenberg/tailpatch.ipynb notebook. If you want to test on your own samples take the tailpatch function given in the first cell.
To evaluate via retraining run one of the following: gutenberg/retraining_gutenberg_bm25/ftun/gecko/trackstar.py. Each one of these handles how we stored attributed samples according to their method. Example for ours:
cd gutenberg
python retraining_gutenberg_ftun --authors "William Shakespeare" --num_samples <k>tailpatch.ipynb). In our analysis we used
To compute the comparison baselines, refer to their corresponding papers, BM25 (Robertson & Walker, 1994), GECKO embeddings (Lee et al., 2024), and Scalable Influence and Fact Tracing for Large Language Model Pretraining (Chang et al., 2024), as well as their implementations in the wikipedia and gutenberg folders, which are adapted to our experiments.