Skip to content

s-sahoo/duo

Repository files navigation

The Diffusion Duality Series

By Subham Sekhar Sahoo, Justin Deschenaux, Aaron Gokaslan, Guanghan Wang, Justin Chiu, Volodymyr Kuleshov

GitHub Open In Colab YouTube deploy arXiv deploy

Unlocks few-step generation in discrete diffusion-LLMs via the underlying Gaussian diffusion.

By Justin Deschenaux, Caglar Gulcehre, Subham Sekhar Sahoo

Open In Colab deploy arXiv

Uniform-state beats Masked diffusion on text and image generation!

This repository contains the code for the two papers in the Diffusion Duality series. It includes:

  • Duo / $\text{Duo}^\text{++}$ sampling (ancestral, ReMDM, $\Psi$-samplers, greedy-tail) — Sampling & Eval
  • Original and efficient curriculum training strategies — Training
  • Discrete Consistency Distillation (DCD) — Distillation
  • Baselines (AR, MDLM, SEDD, D3PM) — Baselines

Getting Started | Checkpoints | Citation

Getting Started

To get started, create a conda environment containing the required dependencies.

conda create -n duo python=3.12
conda activate duo
conda install nvidia/label/cuda-12.4.0::cuda-toolkit
pip install -r requirements.txt
pip install flash_attn==2.7.4.post1

Checkpoints

Training

This repo implements the original Duo curriculum, as well as the fast $\text{Duo}^\text{++}$ curriculum. By default, the training scripts use the original curriculum. To enable the efficient curriculum, simply replace algo.curriculum.mode=simple by algo.curriculum.mode=poly9 (see comments in each training script).

To train $\text{Duo}^\text{++}$, use the following scripts:

Notes:

  • Run mkdir watch_folder to create a directory to store slurm logs, and then run any script in scripts/ as a slurm job: sbatch scripts/ABC_XYZ.sh
  • Control the batch size per GPU using the argument loader.batch_size. If loader.batch_size * num_gpus < loader.global_batch_size, PyTorch Lightning resorts to gradient accumulation.

Discrete Consistency Distillation

To distill a model using the Discrete Consistency Distillation (Alg. 1 in the Duo paper), use scripts/distil_owt.sh.

Sampling & Eval

Likelihood

To compute test perplexity on the validation set of OWT use scripts/eval_owt_duo.sh and for zero shot perplexities use scripts/zero_shot_duo.sh.

Sampling

You can sample with ancestral sampling using the scripts in scripts/gen_ppl_*.sh. To sample with the PC samplers such as ReMDM and our $\Psi$-samplers, use the scripts in scripts/psi_samplers. This directory contains examples for sampling text and images.

To use the "Greedy-tail sampler" (equivalent to nucleus sampling in AR models; see Sec. 4.2 in the paper), set sampling.noise_removal=greedy. Using the default sampling.noise_removal=ancestral will produce more diverse samples (higher entropy) but with worse generative perplexity.

To sample from a HuggingFace checkpoint (text only), run the following command:

python main.py \
  mode=sample_eval \
  loader.batch_size=2 \
  loader.eval_batch_size=8 \
  data=openwebtext-split \
  algo=duo_base \
  algo.backbone=hf_dit \
  eval.checkpoint_path=s-sahoo/duo-distilled \
  sampling.steps=8 \
  sampling.num_sample_batches=1 \
  sampling.noise_removal=greedy \
  +wandb.offline=true 

To use the example scripts with raw checkpoints (see Checkpoints), download them and set the checkpoint path in the script.

Baselines

Download the baseline checkpoints (see Checkpoints) and specify the paths appropriately in the respective shell scripts:

Acknowledgements & Citation

This repository was built off of MDLM's Github repository. Cite our papers using:

@inproceedings{
    sahoo2025the,
    title={The Diffusion Duality},
    author={Subham Sekhar Sahoo and Justin Deschenaux and Aaron Gokaslan and Guanghan Wang and Justin T Chiu and Volodymyr Kuleshov},
    booktitle={Forty-second International Conference on Machine Learning},
    year={2025},
    url={https://openreview.net/forum?id=9P9Y8FOSOk}
}

@inproceedings{
    deschenaux2026the,
    title={The Diffusion Duality, Chapter {II}: \${\textbackslash}Psi\$-Samplers and Efficient Curriculum},
    author={Justin Deschenaux and Caglar Gulcehre and Subham Sekhar Sahoo},
    booktitle={The Fourteenth International Conference on Learning Representations},
    year={2026},
    url={https://openreview.net/forum?id=RSIoYWIzaP}
}

About

[ICML 2025] The Diffusion Duality

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •