LLaVA-NeXT-Video: fix generation with cache#32527
LLaVA-NeXT-Video: fix generation with cache#32527zucchini-nlp wants to merge 2 commits intohuggingface:mainfrom
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
ArthurZucker
left a comment
There was a problem hiding this comment.
Thanks! Do you mind adding a fast test that would catch this? 🤗
|
Actually we should have VLM generation tests soon (not very soon), but for now maybe I'll try and make a very dummy test for all VLMs. It is annoying that fast tests for VLMs don't catch most basic bugs |
|
Added one test for generation, I will see if we can start integrating test for VLMs before the refactoring is done. The last time it forces us to add many if/elses so we gave up until it is all standardized |
ArthurZucker
left a comment
There was a problem hiding this comment.
LGTM, any reason why the check is different from #32836
|
Nah, both are equally valid since current VLMs doesn't support speculative decoding. Let's use the PR from yesterday which has this and another fix, I'll close this one. The current state of main already has it fixed due to the recent refactor |
What does this PR do?
Fixes generation for llava-next-video. Apparently started failing after we moved to cache class but some parts of the code were not modified. Checked all llava models, others are working since check is done on a different condition.
Yes, we can start using
cache_positionand rely on that, but we should note thatcache_positionfor VLMs will not be correct and will contain positions only for text tokens. Adding support for cache position will come in the next PR, which is in progress. We;ll have to deprecate many things before we can get rid of the current checks to "merge or expand"