Skip to content

LLaVA-NeXT-Video: fix generation with cache#32527

Closed
zucchini-nlp wants to merge 2 commits intohuggingface:mainfrom
zucchini-nlp:llava_cache
Closed

LLaVA-NeXT-Video: fix generation with cache#32527
zucchini-nlp wants to merge 2 commits intohuggingface:mainfrom
zucchini-nlp:llava_cache

Conversation

@zucchini-nlp
Copy link
Copy Markdown
Member

What does this PR do?

Fixes generation for llava-next-video. Apparently started failing after we moved to cache class but some parts of the code were not modified. Checked all llava models, others are working since check is done on a different condition.

Yes, we can start using cache_position and rely on that, but we should note that cache_position for VLMs will not be correct and will contain positions only for text tokens. Adding support for cache position will come in the next PR, which is in progress. We;ll have to deprecate many things before we can get rid of the current checks to "merge or expand"

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Copy Markdown
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Do you mind adding a fast test that would catch this? 🤗

@zucchini-nlp
Copy link
Copy Markdown
Member Author

Actually we should have VLM generation tests soon (not very soon), but for now maybe I'll try and make a very dummy test for all VLMs. It is annoying that fast tests for VLMs don't catch most basic bugs

@zucchini-nlp
Copy link
Copy Markdown
Member Author

Added one test for generation, I will see if we can start integrating test for VLMs before the refactoring is done. The last time it forces us to add many if/elses so we gave up until it is all standardized

Copy link
Copy Markdown
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, any reason why the check is different from #32836

@zucchini-nlp
Copy link
Copy Markdown
Member Author

Nah, both are equally valid since current VLMs doesn't support speculative decoding. Let's use the PR from yesterday which has this and another fix, I'll close this one. The current state of main already has it fixed due to the recent refactor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants