Skip to content

🚨 [v5] remove deprecated cache tuple input support#41405

Closed
gante wants to merge 7 commits intohuggingface:mainfrom
gante:rm_cache_tuple_input_support
Closed

🚨 [v5] remove deprecated cache tuple input support#41405
gante wants to merge 7 commits intohuggingface:mainfrom
gante:rm_cache_tuple_input_support

Conversation

@gante
Copy link
Copy Markdown
Contributor

@gante gante commented Oct 7, 2025

What does this PR do?

This PR:

  • Removes the last remnants (🤞) of tuple cache input support, scheduled to be removed in v4.58
  • Removes other minor items scheduled to be removed in v4.58, caught in the search
  • Updates a few incorrect past_key_values type hints
  • ⚠️ EDIT: add cache initialization in forward on models that were missing it

@gante gante requested review from vasqu and zucchini-nlp October 7, 2025 10:25
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@gante
Copy link
Copy Markdown
Contributor Author

gante commented Oct 7, 2025

CI is failing for models that don't contain cache initialization in their forward pass, working on it

Copy link
Copy Markdown
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great

Comment thread src/transformers/generation/utils.py Outdated
if requires_cross_attention_cache and not isinstance(model_kwargs[cache_name], EncoderDecoderCache):
if (
requires_cross_attention_cache
and cache_name in model_kwargs
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ig this is for special models where name isn't "past_key_values". Is it not going to fail later in model, if we leave the non-EncoderDecoderCache cache?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This branch, which converts an decoder-only cache into an encoder-decoder cache, is meant to kick in in the following case:

  • the model is encoder-decoder
  • the user has specified that they want a specific cache implementation

I'll replace cache_name by "past_key_values" to make it clearer that only models with the default cache name will use this branch

else:
use_cache = False

return_legacy_cache = False
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo we should prepare cache from scratch if it is None, similar to other models. Same for other bert-like models

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that was indeed missing 👍 adding in this PR


self._vision_feature_layer = kwargs.get("vision_feature_layer", -1)

@property
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets delete the line 126 as well, it is not used in modeling

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Oct 7, 2025

[For maintainers] Suggested jobs to run (before merge)

run-slow: aimv2, autoformer, bark, bart, bert, bert_generation, big_bird, bigbird_pegasus, biogpt, blenderbot, blenderbot_small, blip, bloom, blt, bridgetower, camembert

Copy link
Copy Markdown
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't have any major comments but we should sync with #41378 (which I noticed halfway through here)

Seems like a bit of duplicated effort atm. So it's hard to say how to proceed without proper comms with @Cyrilvallez here. Otherwise LGTM (when the CI failures are fixed)

encoder_hidden_states: Optional[torch.Tensor] = None,
encoder_attention_mask: Optional[torch.Tensor] = None,
past_key_values: Optional[Union[list[torch.FloatTensor], Cache]] = None,
past_key_values: Optional[Cache] = None,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally 🙏

Comment on lines +401 to 402
if isinstance(past_key_values, DynamicCache):
past_key_values = EncoderDecoderCache(past_key_values, DynamicCache(config=self.config))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not super important but we still need this if generation checks for cross attn cache necessity and converts if necessary (also slightly changed in this PR for other cache names ig). Might be that this needs this custom logic, not familiar with the model here.

Relevant lines in main

if requires_cross_attention_cache and not isinstance(model_kwargs[cache_name], EncoderDecoderCache):
model_kwargs[cache_name] = EncoderDecoderCache(
model_kwargs[cache_name], # self-attention cache
DynamicCache(**dynamic_cache_kwargs), # cross-attention cache
)

)
use_cache = False

return_legacy_cache = False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a lot of deletions for a lot of models similar to here. Do all of them need a cache init or is it more model dependent? (Sorry for the mess here on the dependencies in the case we need them all 😢)

@gante
Copy link
Copy Markdown
Contributor Author

gante commented Oct 7, 2025

@vasqu indeed duplicated work. I'm going to sync with @Cyrilvallez

@gante
Copy link
Copy Markdown
Contributor Author

gante commented Oct 7, 2025

Closing in place of #41378 , which is more complete!

@gante gante closed this Oct 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants