The Diffusion Duality Series

Chapter I (ICML 2025)

By Subham Sekhar Sahoo, Justin Deschenaux, Aaron Gokaslan, Guanghan Wang, Justin Chiu, Volodymyr Kuleshov

Unlocks few-step generation in discrete diffusion-LLMs via the underlying Gaussian diffusion.

Chapter II: Ψ-Samplers and Efficient Curriculum (ICLR 2026)

By Justin Deschenaux, Caglar Gulcehre, Subham Sekhar Sahoo

Uniform-state beats Masked diffusion on text and image generation!

This repository contains the code for the two papers in the Diffusion Duality series. It includes:

Duo / $\text{Duo}^\text{++}$ sampling (ancestral, ReMDM, $\Psi$-samplers, greedy-tail) — Sampling & Eval
Original and efficient curriculum training strategies — Training
Discrete Consistency Distillation (DCD) — Distillation
Baselines (AR, MDLM, SEDD, D3PM) — Baselines

Getting Started | Checkpoints | Citation

Getting Started

To get started, create a conda environment containing the required dependencies.

conda create -n duo python=3.12
conda activate duo
conda install nvidia/label/cuda-12.4.0::cuda-toolkit
pip install -r requirements.txt
pip install flash_attn==2.7.4.post1

Checkpoints

Duo (Language Modeling): Trained on OpenWebText for 1M training steps (distilled / base):
- Huggingface🤗.
- Google Drive folder as the HF checkpoints can't be finetuned.
Duo (Image Modeling): Trained on CIFAR-10
- Huggingface (contains the raw checkpoints)
Baselines (SEDD, MDLM, AR): Trained on OpenWebText
- Google Drive folder — download ar.ckpt, mdlm.ckpt, sedd.ckpt.

Training

This repo implements the original Duo curriculum, as well as the fast $\text{Duo}^\text{++}$ curriculum. By default, the training scripts use the original curriculum. To enable the efficient curriculum, simply replace algo.curriculum.mode=simple by algo.curriculum.mode=poly9 (see comments in each training script).

To train $\text{Duo}^\text{++}$, use the following scripts:

LM1B
- w/ sentencepacking (same as in D3PM)
  - Training script: scripts/train_lm1b_duo_sentencepacking.sh
  - Wandb run
- w/o sentencepacking (same as in MDLM, SEDD)
  - Training script: scripts/train_lm1b_duo.sh
  - Wandb run
OWT: scripts/train_owt_duo.sh.
CIFAR-10:
- Duo: scripts/train_cifar10_duo_cosine.sh
- MDLM: scripts/train_cifar10_mdlm_cosine.sh
- Both scripts default to a cosine noise schedule. To use log-linear instead, set noise=log-linear.

Notes:

Run mkdir watch_folder to create a directory to store slurm logs, and then run any script in scripts/ as a slurm job: sbatch scripts/ABC_XYZ.sh
Control the batch size per GPU using the argument loader.batch_size. If loader.batch_size * num_gpus < loader.global_batch_size, PyTorch Lightning resorts to gradient accumulation.

Discrete Consistency Distillation

To distill a model using the Discrete Consistency Distillation (Alg. 1 in the Duo paper), use scripts/distil_owt.sh.

Sampling & Eval

Likelihood

To compute test perplexity on the validation set of OWT use scripts/eval_owt_duo.sh and for zero shot perplexities use scripts/zero_shot_duo.sh.

Sampling

You can sample with ancestral sampling using the scripts in scripts/gen_ppl_*.sh. To sample with the PC samplers such as ReMDM and our $\Psi$-samplers, use the scripts in scripts/psi_samplers. This directory contains examples for sampling text and images.

To use the "Greedy-tail sampler" (equivalent to nucleus sampling in AR models; see Sec. 4.2 in the paper), set sampling.noise_removal=greedy. Using the default sampling.noise_removal=ancestral will produce more diverse samples (higher entropy) but with worse generative perplexity.

To sample from a HuggingFace checkpoint (text only), run the following command:

python main.py \
  mode=sample_eval \
  loader.batch_size=2 \
  loader.eval_batch_size=8 \
  data=openwebtext-split \
  algo=duo_base \
  algo.backbone=hf_dit \
  eval.checkpoint_path=s-sahoo/duo-distilled \
  sampling.steps=8 \
  sampling.num_sample_batches=1 \
  sampling.noise_removal=greedy \
  +wandb.offline=true

To use the example scripts with raw checkpoints (see Checkpoints), download them and set the checkpoint path in the script.

Baselines

Download the baseline checkpoints (see Checkpoints) and specify the paths appropriately in the respective shell scripts:

scripts/eval_owt_*.sh for computing validation perplexity on OWT.
scripts/gen_ppl_*.sh for generating text samples and evaluating them.
scripts/zero_shot_*.sh for computing zero shot perplexities.
scripts/train_*.sh for training the models.

Acknowledgements & Citation

This repository was built off of MDLM's Github repository. Cite our papers using:

@inproceedings{
    sahoo2025the,
    title={The Diffusion Duality},
    author={Subham Sekhar Sahoo and Justin Deschenaux and Aaron Gokaslan and Guanghan Wang and Justin T Chiu and Volodymyr Kuleshov},
    booktitle={Forty-second International Conference on Machine Learning},
    year={2025},
    url={https://openreview.net/forum?id=9P9Y8FOSOk}
}

@inproceedings{
    deschenaux2026the,
    title={The Diffusion Duality, Chapter {II}: \${\textbackslash}Psi\$-Samplers and Efficient Curriculum},
    author={Justin Deschenaux and Caglar Gulcehre and Subham Sekhar Sahoo},
    booktitle={The Fourteenth International Conference on Learning Representations},
    year={2026},
    url={https://openreview.net/forum?id=RSIoYWIzaP}
}

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
configs		configs
integral		integral
models		models
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
algo.py		algo.py
dataloader.py		dataloader.py
main.py		main.py
metrics.py		metrics.py
requirements.txt		requirements.txt
trainer_base.py		trainer_base.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Diffusion Duality Series

Chapter I (ICML 2025)

Chapter II: Ψ-Samplers and Efficient Curriculum (ICLR 2026)

Getting Started

Checkpoints

Training

Discrete Consistency Distillation

Sampling & Eval

Likelihood

Sampling

Baselines

Acknowledgements & Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

s-sahoo/duo

Folders and files

Latest commit

History

Repository files navigation

The Diffusion Duality Series

Chapter I (ICML 2025)

Chapter II: Ψ-Samplers and Efficient Curriculum (ICLR 2026)

Getting Started

Checkpoints

Training

Discrete Consistency Distillation

Sampling & Eval

Likelihood

Sampling

Baselines

Acknowledgements & Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages