LLaVA-NeXT-Video: fix generation with cache by zucchini-nlp · Pull Request #32527 · huggingface/transformers

zucchini-nlp · 2024-08-08T10:40:12Z

What does this PR do?

Fixes generation for llava-next-video. Apparently started failing after we moved to cache class but some parts of the code were not modified. Checked all llava models, others are working since check is done on a different condition.

Yes, we can start using cache_position and rely on that, but we should note that cache_position for VLMs will not be correct and will contain positions only for text tokens. Adding support for cache position will come in the next PR, which is in progress. We;ll have to deprecate many things before we can get rid of the current checks to "merge or expand"

HuggingFaceDocBuilderDev · 2024-08-08T10:58:38Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker

Thanks! Do you mind adding a fast test that would catch this? 🤗

zucchini-nlp · 2024-08-08T12:33:15Z

Actually we should have VLM generation tests soon (not very soon), but for now maybe I'll try and make a very dummy test for all VLMs. It is annoying that fast tests for VLMs don't catch most basic bugs

zucchini-nlp · 2024-08-08T13:37:24Z

Added one test for generation, I will see if we can start integrating test for VLMs before the refactoring is done. The last time it forces us to add many if/elses so we gave up until it is all standardized

ArthurZucker

LGTM, any reason why the check is different from #32836

zucchini-nlp · 2024-08-16T04:05:30Z

Nah, both are equally valid since current VLMs doesn't support speculative decoding. Let's use the PR from yesterday which has this and another fix, I'll close this one. The current state of main already has it fixed due to the recent refactor

fix llava-next-video generation

00099bb

zucchini-nlp assigned ArthurZucker Aug 8, 2024

ArthurZucker reviewed Aug 8, 2024

View reviewed changes

zucchini-nlp mentioned this pull request Aug 8, 2024

LLaVa-NeXT-Video is added to 🤗 Transformers! LLaVA-VL/LLaVA-NeXT#79

Open

add tests

7c70acb

zucchini-nlp requested a review from ArthurZucker August 8, 2024 13:36

zucchini-nlp mentioned this pull request Aug 12, 2024

Fix: FA2 with packed training #32487

Merged

ArthurZucker mentioned this pull request Aug 15, 2024

Fix VLM generation issues #32836

Merged

5 tasks

ArthurZucker approved these changes Aug 15, 2024

View reviewed changes

zucchini-nlp closed this Aug 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLaVA-NeXT-Video: fix generation with cache#32527

LLaVA-NeXT-Video: fix generation with cache#32527
zucchini-nlp wants to merge 2 commits intohuggingface:mainfrom
zucchini-nlp:llava_cache

zucchini-nlp commented Aug 8, 2024

Uh oh!

HuggingFaceDocBuilderDev commented Aug 8, 2024

Uh oh!

ArthurZucker left a comment

Uh oh!

zucchini-nlp commented Aug 8, 2024

Uh oh!

zucchini-nlp commented Aug 8, 2024

Uh oh!

ArthurZucker left a comment

Uh oh!

zucchini-nlp commented Aug 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

zucchini-nlp commented Aug 8, 2024

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Aug 8, 2024

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp commented Aug 8, 2024

Uh oh!

zucchini-nlp commented Aug 8, 2024

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp commented Aug 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants