⭐If you like this work, please help star this repo. Thanks!🤗
# git clone this repository
git clone https://github.com/CVL-UESTC/VE-Loss.git
cd VE-Loss
conda create -n VE-Loss python
conda activate VE-Loss
pip install -r requirements.txt
Download the pretrained tokenizer weights, diffusion model weights and a latent stats (including the mean and variance of each latent channel computed on the ImageNet-train dataset, which are required during sampling and decoding) from Releases and place it in the trained_weights folder.
Then, you can get the sampling outputs by running the following command:
bash run_inference.sh configs/lightningdit_xl_vavae_f16d32.yaml
The final FID-50k reported in paper is evaluated with ADM:
git clone https://github.com/openai/guided-diffusion.git
# save your npz file with tools/save_npz.py
bash run_fid_eval.sh /path/to/your.npz
Download training dataset: ImageNet.
Complete configuration file.
Our pretrained weights of tokeinzer and diffusion model can be found in the Hugggingface. You can choose to train any individual stage based on our released weights or all stages by yourself.
If needing training tokenizer, you need installing additional packages:
cd vavae
pip install -r vavae_requirements.txt
git clone https://github.com/CompVis/taming-transformers.git
cd taming-transformers
pip install -e .
export FILE_PATH=./taming-transformers/taming/data/utils.py
sed -i 's/from torch._six import string_classes/from six import string_types as string_classes/' "$FILE_PATH"
Our tokenizer model is fine-tuned from the VAE weights released by va-vae, so you need to download their VAE weights first.
Modify training config as you need, then run training by:
bash run_train.sh vavae/configs/f16d32_vfdinov2_finetune.yaml
Your training logs and checkpoints will be saved in the logs folder. We train tokenizer with 8 RTX 4090 GPUs.
We use VA-VAE to extract latents for all ImageNet images. During extraction, we apply random horizontal flips to maintain consistency with previous works. Run:
Modify extract_features.py to your own data path and {output_path}.
bash run_extraction.sh tokenizer/configs/vavae_f16d32.yaml
You need to modify some necessary paths as required in configs/lightningdit_xl_vavae_f16d32.yaml.
Then, run the following command to start training:
bash run_train.sh configs/lightningdit_xl_vavae_f16d32.yaml
Your training logs and checkpoints will be saved in the output folder. We train diffusion model with 4 RTX Pro 6000 GPUs.
Please cite us if our work is useful for your research.
@article{li2026taming,
title={Taming Sampling Perturbations with Variance Expansion Loss for Latent Diffusion Models},
author={Li, Qifan and Zhou, Xingyu and Zhang, Jinhua and You, Weiyi and Gu, Shuhang},
journal={arXiv preprint arXiv:2603.21085},
year={2026}
}
This project is mainly based on LightningDiT. Thanks for their great work.
If you have any questions, feel free to approach me at qifanli.lqf@gmail.com

