-
Notifications
You must be signed in to change notification settings - Fork 6.7k
[Pipeline] Port Tune-A-Video pipeline to diffusers #2455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
|
Thanks so much, @Abhinay1997! This is quick. W.r.t #2432 (comment)
Yeah, let's replace with native PyTorch ops.
I think we'd want to keep |
|
Thank you @sayakpaul :) Got it. Will update to use torch ops. Ok. Will retain the |
|
@sayakpaul just realized that the original code was built on an older |
|
cc @williamberman @yiyixuxu @sayakpaul could you help here? |
I will take the lead here. If need be I will ping you, folks! Easy till then. |
|
Sorry everyone.. Work has been hectic and I couldn't find time to fix this. I'll be working on it today :) |
|
No problem at all. We deeply appreciate your contributions :) |
|
So, I compared my output with the original code output with the same inputs where I was getting noise. It's markedly different. Even when the original code returns noise, there is temporal coherence. So for tomorrow, I'll be comparing both the UNets with same checkpoints and a fixed seed random input. Hopefully they won't match and I can drill down to the smallest failing component. |
In these cases, I usually do the following:
Maybe we can follow something similar here? For starters, I'd begin by inspecting the einops that were replaced by native Torch ops. |
|
@sayakpaul , Thank you for the input. Actually I tested the einops equivalence before I replaced the original code. See: https://colab.research.google.com/drive/1BG1b-YVNUAsy9OBEUS7cYe843k14jwCI?usp=sharing So I'll start with the CrossAttention block and then go to the einops equivalents instead of directly comparing the UNets themselves. |
|
Done with suggested changes @patrickvonplaten. |
| *, | ||
| in_channels, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| *, | |
| in_channels, | |
| in_channels, | |
| *, |
no?
|
@patrickvonplaten retained comments like |
No that works for me! |
|
@DN6 can you also review this in-detail especially with respect to AnimateDiff? |
* src/diffusers/models/resnet.py -> type hints for rearrange_dims * src/diffusers/models/unet_3d_blocks.py -> New apply_freeu changes * src/diffusers/models/unet_3d_condition.py -> Updated doc string [Still needs checking]
|
Resolved merge conflicts. Can someone approve the test workflow ? |
|
Thanks Sayak! Team, just an fyi, the failing tests are unrelated to PR. |
| def get_dummy_components(self): | ||
| torch.manual_seed(0) | ||
| unet = UNet3DConditionModel( | ||
| block_out_channels=(32, 64, 64, 64), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to lower the number of channels here so that the tests run faster?
| def test_dict_tuple_outputs_equivalent(self, expected_max_difference=1e-4): | ||
| pass | ||
|
|
||
| @unittest.skip(reason="`set_default_attn_processor` is not supported as we use a custom attn processor") | ||
| def test_save_load_local(self, expected_max_difference=5e-4): | ||
| pass | ||
|
|
||
| @unittest.skip(reason="`set_default_attn_processor` is not supported as we use a custom attn processor") | ||
| def test_save_load_optional_components(self, expected_max_difference=1e-4): | ||
| pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think rather than using the inherited tests, we should implement these tests from scratch here. We can omit setting the default processor.
DN6
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of small requests related to tests, but otherwise LGTM 👍🏽
Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com>
|
Will add the changes to the tests soon ! |
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
|
Apologies for the delay, need to resolve the merge conflict in unet blocks and reduce the test runtime. Didn't get a chance to work on this over the last few weeks. |
|
Hey @Abhinay1997, I'm super sorry, but I think the codebase is changing too fast to keep up with everything. In order to quickly finish the PR could we maybe move everything into the research folder: https://github.com/patrickvonplaten/diffusers/tree/c27e30dcb5eaaabc30f3dbc48587ad52ee345b79/examples/research_projects ? There are a couple powerful video models now out there (such as SVD) and I'm not sure it still makes sense to pursue with this PR as a core diffusers integraiton. I do think however that it would be very valuable as a contribution to |
|
@patrickvonplaten, agree with you about its relevance to core diffusers post SVD. I'll move it to the Thank you for your patience on this. I wasn't planning to drag this for so long. I'll pick this back up as a new PR once the unicontrol PR is merged. |
Thanks! |
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
|
Any updates on this? Would still love to see this done. |
|
We decided to de-priotize it in light of the other video pipelines we have in the library. |
This PR adds the Tune-A-Video pipeline built on diffusers at https://github.com/showlab/Tune-A-Video to be part of the diffusers pipelines.
See discussion at: #2432
Code Sample:
TODOs
UNet3DConditionModelUNet3DConditionModelweights under hf.co/showlabmodels.mdxto includeUNet3DConditionModel, updateoutputs.mdxto update the new video pipeline output)TextualInversionLoaderMixIn.