Skip to content

amazon-science/background-summaries

Background Summarization of Event Timelines (EMNLP 2023)

This is the repository for the EMNLP 2023 paper "Background Summarization of Event Timelines" by Adithya Pratapa, Kevin Small and Markus Dreyer. The image below provides an overview of the background summarization task.

An illustration of background summarization task. The image shows a snippet from the timeline of Michael Jackson's death and the following trial. The image includes five news updates between June 25, 2009 and Nov 29, 2011. A background summary is provided for Nov 29 update, with texts from previous updates highlighted with different colors.

Dataset

Background summarization dataset is available under data, as well as on Hugging Face datasets.

Training and inference

T5-based systems

We experiment with Flan-T5-XL and Long-T5-TGlobal-XL. For Flan-T5-XL, we explore both generic and query-focused setups. See configs/train.conf for supported model configurations.

# example flan-t5-xl training using deepspeed
bash bash_scripts/t5/train.sh flan-t5-xl 8888

For inference, set the checkpoint path in configs/eval.conf and run the evaluation script.

# example flan-t5-xl inference
bash bash_scripts/t5/eval.sh flan-t5-xl

GPT-based systems

We experiment with zero-shot inference with GPT-3.5. See configs/gpt.conf for supported model configurations.

bash bash_scripts/gpt/predict.sh gpt-3.5-turbo

Background Utility Score (BUS)

We propose a new QA-based evaluation metric that measures the utility of a background summary for answering questions about a news update. See the illustration below.

An illustration of the proposed Background Utility Score metric (aka BUS). For the Nov 29, 2011 news update from Michael Jackson's event, the image shows two background questions generated by prompting GPT-3.5. For each question, a table indicates whether background summaries (Flan-T5, GPT-3.5 and Human) answers the question.

See src/bus/bus.py for details on GPT-3.5 and GPT-4 based BUS metrics.

Human and BUS evaluation data

results contains the data from our Mechanical Turk and BUS evaluations. For the 1,000 news updates from test set, it includes human-written and system-generated backgrounds. It includes results from best-worst ratings, BUS--human, BUS--GPT-3.5 and BUS--GPT-4.

MTurk setup

See src/mturk for details on MTurk setup.

Model checkpoints and predictions

To download the model checkpoints and predictions,

URL=https://d1f9rvlwrb54wt.cloudfront.net/background-summaries
wget $URL/models-flan-t5.tgz # flan-t5-xl (file size: ~10G)
wget $URL/models-flan-t5-ift.tgz # flan-t5-xl-ift, flan-t5-xl-ift-ents (file size: ~20G)
wget $URL/models-gpt-anns.tgz # gpt-3.5-turbo, gpt-3.5-turbo-cond-ents, human annotators (file size: ~5M)
wget $URL/models-long-t5.tgz # long-t5-tglobal-xl (file size: ~10G)

Security

See CONTRIBUTING for more information.

License

This project is licensed under the CC-BY-NC-4.0 License. See the LICENSE file.

Reference

You can cite our paper as follows:

@inproceedings{pratapa-etal-2023-background,
    title = "Background Summarization of Event Timelines",
    author = "Pratapa, Adithya and Small, Kevin and Dreyer, Markus",
    booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing",
    publisher = "Association for Computational Linguistics",
    year="2023"
}

About

Repository for "Background Summarization of Event Timelines"

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors