A lightweight plugin for Huggingface's DPOTrainer, achieving up to 11.6% win-rate improvement with minimal overhead compared to fast DPO.
- We need HuggingFace Hub and WanDB to manage experiments. Please fill in
./configs/services/hugggingface.yamland./configs/services/wandb.yamlwith your acconut info. - We need OpenAI api to evaluate models. Please fill in
./configs/services/openai.yamlwith your account info. - We use a HuggingFace Space App to retrieve and review results. Please fill in
./viewer/.envwith your account info.
- With
python 3.10.*andCUDA 12.*installed. You can runpython install -e .to install this package calledcdpo.
- Fill in or modify
./configs/tasks.yamlfor the set of experiments to run.
- Run command
cdpo executeto run all experiments specified.
- Inside
./viewerfolder, runstreamlit run app.pyto start the result viewer. Using the UI there to analysis the results.
The approximate training time and memory requirements of each SAIL training on three models are: Qwen1.5-0.5B: 1-4 hours with 4A40 GPUs; Phi-3-3.8B: 2-8 hours with 4RTX6000Ada GPUs; Llama-3-8B: 2-12 hours with 4*A100 GPUs.