Skip to content

Fix re-compilations for cross attention cache#39788

Merged
zucchini-nlp merged 1 commit intohuggingface:mainfrom
zucchini-nlp:cache-cross-attn-compile
Jul 30, 2025
Merged

Fix re-compilations for cross attention cache#39788
zucchini-nlp merged 1 commit intohuggingface:mainfrom
zucchini-nlp:cache-cross-attn-compile

Conversation

@zucchini-nlp
Copy link
Copy Markdown
Member

@zucchini-nlp zucchini-nlp commented Jul 30, 2025

What does this PR do?

Fixes #39774.

As per title, if we are using the legacy cache.key_cache[layer_idx] a warning is emitted and fullgraph compilation breaks. This PR makes sure no warning are raised when using the models in core library

@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: autoformer, bert, bert_generation, big_bird, bigbird_pegasus, blip, bridgetower, camembert, data2vec, electra, ernie, fsmt, gpt_bigcode, imagegpt, kosmos2, led

Copy link
Copy Markdown
Contributor

@manueldeprada manueldeprada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, sorry!! these changes got lost when cherry-picking back and forth between the layer[i].keys and key_cache[i] designs in the original PR😭

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@zucchini-nlp
Copy link
Copy Markdown
Member Author

zucchini-nlp commented Jul 30, 2025

No worries, that happens 😄

Let me see if I can add encoder-decoder compile test easily in this PR or if we need to handle a lot of edge cases

EDIT: oh these aren't generative models/can't compile fullgraph and we don't have graph-break test for those models yet. That's why it wasn't caught in CI

@zucchini-nlp zucchini-nlp merged commit 8e077a3 into huggingface:main Jul 30, 2025
25 checks passed
zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Blip model got performance regression on compile mode after refactor cache.

3 participants