[Pipeline] Extending Stable Diffusion for generating videos

Since [Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation](https://arxiv.org/abs/2212.11565) has been there for sometime now, it'd be cool to officially have it supported from Diffusers 🧨 

The best part is the official repository (https://github.com/showlab/Tune-A-Video) itself builds on top of Diffusers.

Architecture-wise, the main change is to inflate the UNet to operate spatiotemporally. This is implemented in the [`UNet3DConditionModel`](https://github.com/showlab/Tune-A-Video/blob/f189b73bab91de7e7ab093dd9bfde77c733d2e6e/tuneavideo/models/unet.py#L37). 

@zhangjiewu will it be possible to publish a few weights on the Hugging Face Hub for the community to try out quickly? Happy to help with the process :) 

We're more than happy to help if a community member wants to pick this up. As it will be the first end-to-end video pipeline in `Diffusers`, I'm very excited about it. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Pipeline] Extending Stable Diffusion for generating videos #2432

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Pipeline] Extending Stable Diffusion for generating videos #2432

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions