[`Attn`] Allow dynamic causality in SDPA via Kwargs by vasqu · Pull Request #41692 · huggingface/transformers

vasqu · 2025-10-17T14:34:53Z

As per title, it's

1. Nice for power users
1. Apparently some models do need to switch like CLIP (depending on their input, see [FIX]: CLIP support for flash-attention-3 #41673 (comment))

This allows us to rely on the set module's attribute per default but overwrite with kwarg if given. cc @zucchini-nlp

vasqu · 2025-10-17T14:35:30Z

    scaling: Optional[float] = None,
    sliding_window: Optional[int] = None,
    softcap: Optional[float] = None,
+    is_causal: Optional[bool] = None,


Imo, it's nicer to define this as explicit kwarg here instead of doing kwarg.get...

HuggingFaceDocBuilderDev · 2025-10-17T14:44:23Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

zucchini-nlp

Functionality-wise looks good to me. I think we can delete now passing it explicitly as in

transformers/src/transformers/models/idefics/vision.py

Line 254 in 12a50f2

is_causal=self.is_causal,

and fix models like CLIP that change self attributes at run-time. Searching shows a few models with similar pattern

Cyrilvallez

Nice, thanks for improving! Just 2 small comments!

Cyrilvallez · 2025-10-17T15:17:06Z

+    # Kwarg takes precedence over the defined module's attribute
+    # - Allows dynamic switching, e.g. when model's switch based on the model input type (CLIP)
+    # - Defaults to "normal" behavior for all attention types (encoder, decoder, cross)


nit: this comment looks a bit complicated to me, a simple "we give precedence to kwarg, then module if not present" as for fa would be clearer IMO, but no strong opinion - feel free to disregard if you think otherwise!

Lol, yea fair enough updating with the fa version in a second

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

vasqu · 2025-10-17T15:41:41Z

#41692 (review) cc @yonigozlan @molbap for vision models refactors to keep in mind!

* is causal as kwarg * Update src/transformers/integrations/sdpa_attention.py Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> * fix comment --------- Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

is causal as kwarg

938d08a

vasqu commented Oct 17, 2025

View reviewed changes

Comment thread src/transformers/integrations/sdpa_attention.py

vasqu requested review from ArthurZucker, Cyrilvallez and zucchini-nlp October 17, 2025 14:36

zucchini-nlp approved these changes Oct 17, 2025

View reviewed changes

Cyrilvallez approved these changes Oct 17, 2025

View reviewed changes

vasqu and others added 2 commits October 17, 2025 17:39

Update src/transformers/integrations/sdpa_attention.py

ca6d16c

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

fix comment

1a46b4a

vasqu enabled auto-merge (squash) October 17, 2025 15:42

vasqu merged commit 7e204ad into huggingface:main Oct 17, 2025
22 checks passed

vasqu deleted the causal-kwarg-sdpa branch October 17, 2025 15:52

vasqu mentioned this pull request Oct 20, 2025

🚨 [Clip] Fix masking and enable flash attention on all model types #41750

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[`Attn`] Allow dynamic causality in SDPA via Kwargs#41692

[`Attn`] Allow dynamic causality in SDPA via Kwargs#41692
vasqu merged 3 commits intohuggingface:mainfrom
vasqu:causal-kwarg-sdpa

vasqu commented Oct 17, 2025 •

edited

Loading

Uh oh!

vasqu Oct 17, 2025

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Oct 17, 2025

Uh oh!

zucchini-nlp left a comment

Uh oh!

Cyrilvallez left a comment

Uh oh!

Cyrilvallez Oct 17, 2025

Uh oh!

vasqu Oct 17, 2025

Uh oh!

Uh oh!

vasqu commented Oct 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

vasqu commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vasqu Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Oct 17, 2025

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez left a comment

Choose a reason for hiding this comment

Uh oh!

Cyrilvallez Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

vasqu Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vasqu commented Oct 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

vasqu commented Oct 17, 2025 •

edited

Loading