Learning to Ask Informative Questions: Enhancing LLMs with Preference Optimization and Expected Information Gain
Code for the EMNLP 2024 paper (Findings).
If you use conda, create the environment for this project running:
conda env create -f environment.ymlIf you use venv, activate your environment and run:
pip install -r requirements.txtTo create the datasets for Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO), insert the huggingface_login credential and the path to save HF models and datasets in lines 385-386 of bootstrapping.py. Then run it:
python scripts/bootstrapping.py This will populate the data/bootstrapped folder and create a HuggingFace dataset that will be used for DPO (DPO dataset used in the paper).
For training the base model with SFT, insert the cache_dir and output_dir and run:
python scripts/SFT.py The best-performing checkpoints for the SFT model are after 4k samples (SFT adapter).
Insert the cache_dir, output_dir and huggingface_login. Then run for DPO training:
python scripts/DPO.py The trained DPO model is in HuggingFace Hub (DPO adapter).
If you find it useful, you can cite our paper as:
@inproceedings{mazzaccara2024learningtoask,
title = "Learning to Ask Informative Questions: Enhancing LLMs with Preference Optimization and Expected Information Gain",
author = "Mazzaccara, Davide and
Testoni, Alberto and
Bernardi, Raffaella",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2024",
year = "2024",
url = "https://aclanthology.org/2024.findings-emnlp.291/",
}