SAIL: Self-Improving Efficient Online Alignment of Large Language Models

A lightweight plugin for Huggingface's DPOTrainer, achieving up to 11.6% win-rate improvement with minimal overhead compared to fast DPO.

Step 1: Filling YAML Configs

We need HuggingFace Hub and WanDB to manage experiments. Please fill in ./configs/services/hugggingface.yaml and ./configs/services/wandb.yaml with your acconut info.
We need OpenAI api to evaluate models. Please fill in ./configs/services/openai.yaml with your account info.
We use a HuggingFace Space App to retrieve and review results. Please fill in ./viewer/.env with your account info.

Step 2: Installation

With python 3.10.* and CUDA 12.* installed. You can run python install -e . to install this package called cdpo.

Step 3: Config the Experiments to Run

Fill in or modify ./configs/tasks.yaml for the set of experiments to run.

Step 4: Run Experiments

Run command cdpo execute to run all experiments specified.

Step 5: View Results

Inside ./viewer folder, run streamlit run app.py to start the result viewer. Using the UI there to analysis the results.

Training time and memory requirements.

The approximate training time and memory requirements of each SAIL training on three models are: Qwen1.5-0.5B: 1-4 hours with 4A40 GPUs; Phi-3-3.8B: 2-8 hours with 4RTX6000Ada GPUs; Llama-3-8B: 2-12 hours with 4*A100 GPUs.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
cdpo		cdpo
configs		configs
pipelines		pipelines
utils		utils
viewer		viewer
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cli.py		cli.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SAIL: Self-Improving Efficient Online Alignment of Large Language Models

A lightweight plugin for Huggingface's DPOTrainer, achieving up to 11.6% win-rate improvement with minimal overhead compared to fast DPO.

Step 1: Filling YAML Configs

Step 2: Installation

Step 3: Config the Experiments to Run

Step 4: Run Experiments

Step 5: View Results

Training time and memory requirements.

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SAIL: Self-Improving Efficient Online Alignment of Large Language Models

A lightweight plugin for Huggingface's DPOTrainer, achieving up to 11.6% win-rate improvement with minimal overhead compared to fast DPO.

Step 1: Filling YAML Configs

Step 2: Installation

Step 3: Config the Experiments to Run

Step 4: Run Experiments

Step 5: View Results

Training time and memory requirements.

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages