Skip to content

CVL-UESTC/VE-Loss

Repository files navigation

Taming Sampling Perturbations with Variance Expansion Loss for Latent Diffusion Models (CVPR 2026)

Qifan Li, Xingyu Zhou, Jinhua Zhang, Weiyi You, Shuhang Gu

arXiv GitHub Stars

⭐If you like this work, please help star this repo. Thanks!🤗

Visualization Results

Dependencies and Installation

# git clone this repository
git clone https://github.com/CVL-UESTC/VE-Loss.git
cd VE-Loss

conda create -n VE-Loss python
conda activate VE-Loss
pip install -r requirements.txt

Inference

Download Pre-trained Models

Download the pretrained tokenizer weights, diffusion model weights and a latent stats (including the mean and variance of each latent channel computed on the ImageNet-train dataset, which are required during sampling and decoding) from Releases and place it in the trained_weights folder.

Quick Sampling

Then, you can get the sampling outputs by running the following command:

bash run_inference.sh configs/lightningdit_xl_vavae_f16d32.yaml

The final FID-50k reported in paper is evaluated with ADM:

git clone https://github.com/openai/guided-diffusion.git

# save your npz file with tools/save_npz.py
bash run_fid_eval.sh /path/to/your.npz

Training

Preparing Dataset

Download training dataset: ImageNet.

Complete configuration file.

Preparing Pretrained Weights

Our pretrained weights of tokeinzer and diffusion model can be found in the Hugggingface. You can choose to train any individual stage based on our released weights or all stages by yourself.

Tokenizer (8 x 24GB GPUs)

If needing training tokenizer, you need installing additional packages:

cd vavae
pip install -r vavae_requirements.txt
git clone https://github.com/CompVis/taming-transformers.git

cd taming-transformers
pip install -e .
export FILE_PATH=./taming-transformers/taming/data/utils.py
sed -i 's/from torch._six import string_classes/from six import string_types as string_classes/' "$FILE_PATH"

Our tokenizer model is fine-tuned from the VAE weights released by va-vae, so you need to download their VAE weights first.

Modify training config as you need, then run training by:

bash run_train.sh vavae/configs/f16d32_vfdinov2_finetune.yaml

Your training logs and checkpoints will be saved in the logs folder. We train tokenizer with 8 RTX 4090 GPUs.

Diffusion Model (4 x 96GB GPUs)

Extract ImageNet Latents

We use VA-VAE to extract latents for all ImageNet images. During extraction, we apply random horizontal flips to maintain consistency with previous works. Run:

Modify extract_features.py to your own data path and {output_path}.

bash run_extraction.sh tokenizer/configs/vavae_f16d32.yaml
Train LightningDiT

You need to modify some necessary paths as required in configs/lightningdit_xl_vavae_f16d32.yaml.

Then, run the following command to start training:

bash run_train.sh configs/lightningdit_xl_vavae_f16d32.yaml

Your training logs and checkpoints will be saved in the output folder. We train diffusion model with 4 RTX Pro 6000 GPUs.

🥰 Citation

Please cite us if our work is useful for your research.

@article{li2026taming,
  title={Taming Sampling Perturbations with Variance Expansion Loss for Latent Diffusion Models},
  author={Li, Qifan and Zhou, Xingyu and Zhang, Jinhua and You, Weiyi and Gu, Shuhang},
  journal={arXiv preprint arXiv:2603.21085},
  year={2026}
}

Acknowledgement

This project is mainly based on LightningDiT. Thanks for their great work.

Contact

If you have any questions, feel free to approach me at qifanli.lqf@gmail.com

About

CVPR 2026 Taming Sampling Perturbations with Variance Expansion Loss for Latent Diffusion Models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors