Smart-Diffusion

This repository is the pure enjoyment version of Chitu-Diffusion.

Chitu is a high-performance LLM inference framework jointly developed by the PACMAN team from Tsinghua University and QingCheng.ai. We aim to provide support for the rapidly growing Diffusion ecosystem. Thus, we have restructured the DiT model under the API and scheduling philosophy of Chitu, maintaining scheduling flexibility while offering extreme performance. We aim to provide a truly simple and easy-to-use AIGC acceleration framework.

Chitu-Diffusion is currently in the testing and development phase. We are working hard to make it better! We welcome anyone interested to join our team, use, test, and participate in the development.

We currently support the Wan-T2V series and are continuously adding support for new models, operators, and algorithm optimizations.

Setup

Environment

Recommended software environment: Python3.12, cuda 12.4

Install according to chitu/diffusion/requirements.txt and run:

pip install -e .

Flash Attention is recommended to be installed via wheel: https://github.com/Dao-AILab/flash-attention/releases/tag/v2.7.1.post2

Model Checkpoint

Supported model-ids:

Wan-AI/Wan2.1-T2V-1.3B

Wan-AI/Wan2.1-T2V-14B

Wan-AI/Wan2.2-T2V-A14B

Run Demo

Model architecture parameters (number of layers, attention heads, etc.) are static and set in chitu/config/models/<diffusion-model>.yaml.

User parameters (generation steps, shape, etc.) are dynamic. Chitu provides DiffusionUserParams to set them on a per-request basis.

System parameters (parallelism, operators, acceleration algorithms, etc.) are set in the launch args of Chitu.

Test script: chitu/diffusion/test_generate.py Single-card/Distributed launch: bash srun_wan_demo.sh <num_gpus>

Magic Parameters Explained

`infer.diffusion.low_mem_level`

Low Memory Level: This parameter controls the proportion of GPU memory used by the model, effectively preventing out-of-memory (OOM) issues and allowing the model to run with limited GPU memory.

Parameter Level Description

Level	Description
0	All models are directly loaded into the GPU.
1	VAE enables tiling.
2	T5 model is offloaded to the CPU.
>3	DIT model is offloaded to the CPU.

By properly setting infer.diffusion.low_mem_level, you can flexibly adjust the model loading strategy according to the available GPU memory, ensuring efficient operation of the model with limited resources.

`infer.diffusion.enable_flexcache` [Trial]

Enable FlexCache: The Diffusion backend initializes the FlexCache Manager, which uniformly supports lossy acceleration algorithms based on Feature Reuse.

Parameter Description

Set in the launch script: infer.diffusion.enable_flexcache=true and set the corresponding user parameters.

Currently supports Teacache. You can set flexcache in DiffusionUserParams as follows:

DiffusionUserParams(
    role="Alex",
    prompt="A cat walking on grass.",
    ...
    flexcache='teacache',
)

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
chitu_core		chitu_core
chitu_diffusion		chitu_diffusion
docs		docs
script		script
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_zh.md		README_zh.md
requirements.txt		requirements.txt
setup.py		setup.py
srun_wan_demo.sh		srun_wan_demo.sh
torchrun_wan_demo.sh		torchrun_wan_demo.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Smart-Diffusion

Setup

Environment

Model Checkpoint

Run Demo

Magic Parameters Explained

`infer.diffusion.low_mem_level`

Parameter Level Description

`infer.diffusion.enable_flexcache` [Trial]

Parameter Description

About

Uh oh!

Releases

Packages

Languages

License

chen-yy20/SmartDiffusion

Folders and files

Latest commit

History

Repository files navigation

Smart-Diffusion

Setup

Environment

Model Checkpoint

Run Demo

Magic Parameters Explained

infer.diffusion.low_mem_level

Parameter Level Description

infer.diffusion.enable_flexcache [Trial]

Parameter Description

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`infer.diffusion.low_mem_level`

`infer.diffusion.enable_flexcache` [Trial]

Packages