Skip to content

Remove mask slicing in all eager attentions#42186

Merged
Cyrilvallez merged 9 commits intomainfrom
remove-slice
Feb 10, 2026
Merged

Remove mask slicing in all eager attentions#42186
Cyrilvallez merged 9 commits intomainfrom
remove-slice

Conversation

@Cyrilvallez
Copy link
Copy Markdown
Member

@Cyrilvallez Cyrilvallez commented Nov 13, 2025

What does this PR do?

As per the title. Following #41900.

The mask is (and should!!) be correctly prepared, with the correct shape. If not, then it0s better to crash immediately, as otherwise this leads to very silent bugs!!

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@justinchuby
Copy link
Copy Markdown
Contributor

Was this able to move forward? Thanks!

@Cyrilvallez
Copy link
Copy Markdown
Member Author

Hey @justinchuby! This has been on standby a bit as a lot of errors popped up, because some old models do not create their masks correctly...
We are right now removing old mask preparation primitives from every model, so once this is done this PR should be able to land immediately!

@github-actions
Copy link
Copy Markdown
Contributor

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=42186&sha=e2efac

@Cyrilvallez
Copy link
Copy Markdown
Member Author

Finally ready now that #42848 was merged

Copy link
Copy Markdown
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wdyt about using 2 sources (because some do fp32 softmax and some don't) for eager to unify this? Talking about those that are not using modular atm --> use copied from to ensure we only modify these

Just a thought which might make maintenance easier + I don't have a good overview on which are independent atm

Comment on lines 277 to 278
if attention_mask is not None:
causal_mask = attention_mask[:, :, :, : key_layer.shape[-1]]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops :D

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Damn nice catch! This one thought it could keep on living... 😈

Comment thread src/transformers/models/diffllama/modeling_diffllama.py Outdated
Comment thread src/transformers/models/diffllama/modular_diffllama.py Outdated
@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: afmoe, albert, align, apertus, arcee, aria, audio_spectrogram_transformer, audioflamingo3, bamba, bart, bert, bert_generation, bigbird_pegasus, biogpt, bitnet, blenderbot

@Cyrilvallez
Copy link
Copy Markdown
Member Author

I agree that in general it would be very nice to unify a bit more the eager implementations which are all the same but use different variable names etc for the same things.
This feels a bit outside the scope of this PR which touches a LOT of files already though, and which is focused on removing a useless and dangerous slicing ops - would prefer to do it later on to avoid overcomplicating this one

Copy link
Copy Markdown
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, it's fine that way - wanna run a few slow tests (run-slow) on important models for safety? Otherwise lgtm

@Cyrilvallez
Copy link
Copy Markdown
Member Author

run-slow: bert, gemma2, llama, mistral, mixtral

@github-actions
Copy link
Copy Markdown
Contributor

This comment contains run-slow, running the specified jobs:

models: ["models/bert", "models/gemma2", "models/llama", "models/mistral", "models/mixtral"]
quantizations: []

@github-actions
Copy link
Copy Markdown
Contributor

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN a561c45c merge commit
PR c329adbd branch commit
main 9b4de431 base commit

✅ No failing test specific to this PR 🎉 👏 !

@Cyrilvallez Cyrilvallez merged commit e00bac4 into main Feb 10, 2026
27 checks passed
@Cyrilvallez Cyrilvallez deleted the remove-slice branch February 10, 2026 17:45
jiosephlee pushed a commit to jiosephlee/transformers_latest that referenced this pull request Feb 11, 2026
* remove slice

* finalize

* remove shape check

* modular

* a few more

* a few tried to escape
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants