-
Notifications
You must be signed in to change notification settings - Fork 6.7k
Description
Since Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation has been there for sometime now, it'd be cool to officially have it supported from Diffusers 🧨
The best part is the official repository (https://github.com/showlab/Tune-A-Video) itself builds on top of Diffusers.
Architecture-wise, the main change is to inflate the UNet to operate spatiotemporally. This is implemented in the UNet3DConditionModel.
@zhangjiewu will it be possible to publish a few weights on the Hugging Face Hub for the community to try out quickly? Happy to help with the process :)
We're more than happy to help if a community member wants to pick this up. As it will be the first end-to-end video pipeline in Diffusers, I'm very excited about it.