By Subham Sekhar Sahoo, Justin Deschenaux, Aaron Gokaslan, Guanghan Wang, Justin Chiu, Volodymyr Kuleshov
Unlocks few-step generation in discrete diffusion-LLMs via the underlying Gaussian diffusion.
By Justin Deschenaux, Caglar Gulcehre, Subham Sekhar Sahoo
Uniform-state beats Masked diffusion on text and image generation!
This repository contains the code for the two papers in the Diffusion Duality series. It includes:
-
Duo /
$\text{Duo}^\text{++}$ sampling (ancestral, ReMDM,$\Psi$ -samplers, greedy-tail) — Sampling & Eval - Original and efficient curriculum training strategies — Training
- Discrete Consistency Distillation (DCD) — Distillation
- Baselines (AR, MDLM, SEDD, D3PM) — Baselines
Getting Started | Checkpoints | Citation
To get started, create a conda environment containing the required dependencies.
conda create -n duo python=3.12
conda activate duo
conda install nvidia/label/cuda-12.4.0::cuda-toolkit
pip install -r requirements.txt
pip install flash_attn==2.7.4.post1- Duo (Language Modeling): Trained on OpenWebText for
1Mtraining steps (distilled / base):- Huggingface🤗.
- Google Drive folder as the HF checkpoints can't be finetuned.
- Duo (Image Modeling): Trained on CIFAR-10
- Baselines (SEDD, MDLM, AR): Trained on OpenWebText
- Google Drive folder — download
ar.ckpt,mdlm.ckpt,sedd.ckpt.
- Google Drive folder — download
This repo implements the original Duo curriculum, as well as the fast algo.curriculum.mode=simple by algo.curriculum.mode=poly9 (see comments in each training script).
To train
-
LM1B
- w/ sentencepacking (same as in D3PM)
- Training script:
scripts/train_lm1b_duo_sentencepacking.sh - Wandb run
- Training script:
- w/o sentencepacking (same as in MDLM, SEDD)
- Training script:
scripts/train_lm1b_duo.sh - Wandb run
- Training script:
- w/ sentencepacking (same as in D3PM)
-
OWT:
scripts/train_owt_duo.sh. -
CIFAR-10:
- Duo:
scripts/train_cifar10_duo_cosine.sh - MDLM:
scripts/train_cifar10_mdlm_cosine.sh - Both scripts default to a cosine noise schedule. To use log-linear instead, set
noise=log-linear.
- Duo:
Notes:
- Run
mkdir watch_folderto create a directory to store slurm logs, and then run any script inscripts/as a slurm job:sbatch scripts/ABC_XYZ.sh - Control the batch size per GPU using the argument
loader.batch_size. Ifloader.batch_size * num_gpus < loader.global_batch_size, PyTorch Lightning resorts to gradient accumulation.
To distill a model using the Discrete Consistency Distillation (Alg. 1 in the Duo paper), use scripts/distil_owt.sh.
To compute test perplexity on the validation set of OWT use scripts/eval_owt_duo.sh and for zero shot perplexities use scripts/zero_shot_duo.sh.
You can sample with ancestral sampling using the scripts in scripts/gen_ppl_*.sh. To sample with the PC samplers such as ReMDM and our scripts/psi_samplers. This directory contains examples for sampling text and images.
To use the "Greedy-tail sampler" (equivalent to nucleus sampling in AR models; see Sec. 4.2 in the paper), set sampling.noise_removal=greedy. Using the default sampling.noise_removal=ancestral will produce more diverse samples (higher entropy) but with worse generative perplexity.
To sample from a HuggingFace checkpoint (text only), run the following command:
python main.py \
mode=sample_eval \
loader.batch_size=2 \
loader.eval_batch_size=8 \
data=openwebtext-split \
algo=duo_base \
algo.backbone=hf_dit \
eval.checkpoint_path=s-sahoo/duo-distilled \
sampling.steps=8 \
sampling.num_sample_batches=1 \
sampling.noise_removal=greedy \
+wandb.offline=true To use the example scripts with raw checkpoints (see Checkpoints), download them and set the checkpoint path in the script.
Download the baseline checkpoints (see Checkpoints) and specify the paths appropriately in the respective shell scripts:
scripts/eval_owt_*.shfor computing validation perplexity on OWT.scripts/gen_ppl_*.shfor generating text samples and evaluating them.scripts/zero_shot_*.shfor computing zero shot perplexities.scripts/train_*.shfor training the models.
This repository was built off of MDLM's Github repository. Cite our papers using:
@inproceedings{
sahoo2025the,
title={The Diffusion Duality},
author={Subham Sekhar Sahoo and Justin Deschenaux and Aaron Gokaslan and Guanghan Wang and Justin T Chiu and Volodymyr Kuleshov},
booktitle={Forty-second International Conference on Machine Learning},
year={2025},
url={https://openreview.net/forum?id=9P9Y8FOSOk}
}
@inproceedings{
deschenaux2026the,
title={The Diffusion Duality, Chapter {II}: \${\textbackslash}Psi\$-Samplers and Efficient Curriculum},
author={Justin Deschenaux and Caglar Gulcehre and Subham Sekhar Sahoo},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=RSIoYWIzaP}
}

