GitHub - CVL-UESTC/VE-Loss: CVPR 2026 Taming Sampling Perturbations with Variance Expansion Loss for Latent Diffusion Models

Taming Sampling Perturbations with Variance Expansion Loss for Latent Diffusion Models (CVPR 2026)

Qifan Li, Xingyu Zhou, Jinhua Zhang, Weiyi You, Shuhang Gu

⭐If you like this work, please help star this repo. Thanks!🤗

Visualization Results

Dependencies and Installation

# git clone this repository
git clone https://github.com/CVL-UESTC/VE-Loss.git
cd VE-Loss

conda create -n VE-Loss python
conda activate VE-Loss
pip install -r requirements.txt

Inference

Download Pre-trained Models

Download the pretrained tokenizer weights, diffusion model weights and a latent stats (including the mean and variance of each latent channel computed on the ImageNet-train dataset, which are required during sampling and decoding) from Releases and place it in the trained_weights folder.

Quick Sampling

Then, you can get the sampling outputs by running the following command:

bash run_inference.sh configs/lightningdit_xl_vavae_f16d32.yaml

The final FID-50k reported in paper is evaluated with ADM:

git clone https://github.com/openai/guided-diffusion.git

# save your npz file with tools/save_npz.py
bash run_fid_eval.sh /path/to/your.npz

Training

Preparing Dataset

Download training dataset: ImageNet.

Complete configuration file.

Preparing Pretrained Weights

Our pretrained weights of tokeinzer and diffusion model can be found in the Hugggingface. You can choose to train any individual stage based on our released weights or all stages by yourself.

Tokenizer (8 x 24GB GPUs)

If needing training tokenizer, you need installing additional packages:

cd vavae
pip install -r vavae_requirements.txt
git clone https://github.com/CompVis/taming-transformers.git

cd taming-transformers
pip install -e .
export FILE_PATH=./taming-transformers/taming/data/utils.py
sed -i 's/from torch._six import string_classes/from six import string_types as string_classes/' "$FILE_PATH"

Our tokenizer model is fine-tuned from the VAE weights released by va-vae, so you need to download their VAE weights first.

Modify training config as you need, then run training by:

bash run_train.sh vavae/configs/f16d32_vfdinov2_finetune.yaml

Your training logs and checkpoints will be saved in the logs folder. We train tokenizer with 8 RTX 4090 GPUs.

Diffusion Model (4 x 96GB GPUs)

Extract ImageNet Latents

We use VA-VAE to extract latents for all ImageNet images. During extraction, we apply random horizontal flips to maintain consistency with previous works. Run:

Modify extract_features.py to your own data path and {output_path}.

bash run_extraction.sh tokenizer/configs/vavae_f16d32.yaml

Train LightningDiT

You need to modify some necessary paths as required in configs/lightningdit_xl_vavae_f16d32.yaml.

Then, run the following command to start training:

bash run_train.sh configs/lightningdit_xl_vavae_f16d32.yaml

Your training logs and checkpoints will be saved in the output folder. We train diffusion model with 4 RTX Pro 6000 GPUs.

🥰 Citation

Please cite us if our work is useful for your research.

@article{li2026taming,
  title={Taming Sampling Perturbations with Variance Expansion Loss for Latent Diffusion Models},
  author={Li, Qifan and Zhou, Xingyu and Zhang, Jinhua and You, Weiyi and Gu, Shuhang},
  journal={arXiv preprint arXiv:2603.21085},
  year={2026}
}

Acknowledgement

This project is mainly based on LightningDiT. Thanks for their great work.

Contact

If you have any questions, feel free to approach me at qifanli.lqf@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
assert		assert
configs		configs
datasets		datasets
docs		docs
images		images
models		models
tokenizer		tokenizer
tools		tools
transport		transport
vavae		vavae
.gitignore		.gitignore
README.md		README.md
evaluate_tokenizer.py		evaluate_tokenizer.py
extract_features.py		extract_features.py
inference.py		inference.py
muon.py		muon.py
requirements.txt		requirements.txt
run_extraction.sh		run_extraction.sh
run_fast_inference.sh		run_fast_inference.sh
run_fid_eval.sh		run_fid_eval.sh
run_inference.sh		run_inference.sh
run_tokenizer_eval.sh		run_tokenizer_eval.sh
run_train.sh		run_train.sh
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Taming Sampling Perturbations with Variance Expansion Loss for Latent Diffusion Models (CVPR 2026)

Visualization Results

Dependencies and Installation

Inference

Download Pre-trained Models

Quick Sampling

Training

Preparing Dataset

Preparing Pretrained Weights

Tokenizer (8 x 24GB GPUs)

Diffusion Model (4 x 96GB GPUs)

Extract ImageNet Latents

Train LightningDiT

🥰 Citation

Acknowledgement

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Taming Sampling Perturbations with Variance Expansion Loss for Latent Diffusion Models (CVPR 2026)

Visualization Results

Dependencies and Installation

Inference

Download Pre-trained Models

Quick Sampling

Training

Preparing Dataset

Preparing Pretrained Weights

Tokenizer (8 x 24GB GPUs)

Diffusion Model (4 x 96GB GPUs)

Extract ImageNet Latents

Train LightningDiT

🥰 Citation

Acknowledgement

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages