[BUG] Wan2.2 5B training demo failed in DIffSynth2.0

I'm encountering a training regression after upgrading to DIffSynth2.0. When running the official training script `example/wanvideo/model_training/full/Wan2.2-TI2V-5B.sh`, the resulting model generates severely distorted outputs, particularly in the first few frames of the generated video. See example output:

https://github.com/user-attachments/assets/771501fc-dd2b-4e76-8978-d2393690934f

However, when I downgrade back to the codebase to v1.19 (and use the corresponding training script from that release), training succeeds and produces expected results—no such artifacts appear. I have compared the corresponding codes but I have no idea about what makes the difference. I think it should a bug in 2.0. Can anyone help?
testing env: torch==2.5.1+cu12.4 torchvision==0.20.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] Wan2.2 5B training demo failed in DIffSynth2.0 #1175

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] Wan2.2 5B training demo failed in DIffSynth2.0 #1175

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions