🚨 [v5] remove deprecated cache tuple input support#41405
🚨 [v5] remove deprecated cache tuple input support#41405gante wants to merge 7 commits intohuggingface:mainfrom
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
CI is failing for models that don't contain cache initialization in their forward pass, working on it |
| if requires_cross_attention_cache and not isinstance(model_kwargs[cache_name], EncoderDecoderCache): | ||
| if ( | ||
| requires_cross_attention_cache | ||
| and cache_name in model_kwargs |
There was a problem hiding this comment.
ig this is for special models where name isn't "past_key_values". Is it not going to fail later in model, if we leave the non-EncoderDecoderCache cache?
There was a problem hiding this comment.
This branch, which converts an decoder-only cache into an encoder-decoder cache, is meant to kick in in the following case:
- the model is encoder-decoder
- the user has specified that they want a specific cache implementation
I'll replace cache_name by "past_key_values" to make it clearer that only models with the default cache name will use this branch
| else: | ||
| use_cache = False | ||
|
|
||
| return_legacy_cache = False |
There was a problem hiding this comment.
imo we should prepare cache from scratch if it is None, similar to other models. Same for other bert-like models
There was a problem hiding this comment.
yes, that was indeed missing 👍 adding in this PR
|
|
||
| self._vision_feature_layer = kwargs.get("vision_feature_layer", -1) | ||
|
|
||
| @property |
There was a problem hiding this comment.
lets delete the line 126 as well, it is not used in modeling
|
[For maintainers] Suggested jobs to run (before merge) run-slow: aimv2, autoformer, bark, bart, bert, bert_generation, big_bird, bigbird_pegasus, biogpt, blenderbot, blenderbot_small, blip, bloom, blt, bridgetower, camembert |
vasqu
left a comment
There was a problem hiding this comment.
Don't have any major comments but we should sync with #41378 (which I noticed halfway through here)
Seems like a bit of duplicated effort atm. So it's hard to say how to proceed without proper comms with @Cyrilvallez here. Otherwise LGTM (when the CI failures are fixed)
| encoder_hidden_states: Optional[torch.Tensor] = None, | ||
| encoder_attention_mask: Optional[torch.Tensor] = None, | ||
| past_key_values: Optional[Union[list[torch.FloatTensor], Cache]] = None, | ||
| past_key_values: Optional[Cache] = None, |
| if isinstance(past_key_values, DynamicCache): | ||
| past_key_values = EncoderDecoderCache(past_key_values, DynamicCache(config=self.config)) |
There was a problem hiding this comment.
Not super important but we still need this if generation checks for cross attn cache necessity and converts if necessary (also slightly changed in this PR for other cache names ig). Might be that this needs this custom logic, not familiar with the model here.
Relevant lines in main
transformers/src/transformers/generation/utils.py
Lines 2019 to 2023 in 0464d9e
| ) | ||
| use_cache = False | ||
|
|
||
| return_legacy_cache = False |
There was a problem hiding this comment.
We have a lot of deletions for a lot of models similar to here. Do all of them need a cache init or is it more model dependent? (Sorry for the mess here on the dependencies in the case we need them all 😢)
|
@vasqu indeed duplicated work. I'm going to sync with @Cyrilvallez |
|
Closing in place of #41378 , which is more complete! |
What does this PR do?
This PR:
past_key_valuestype hintsforwardon models that were missing it