Skip to content

[Attn] Allow dynamic causality in SDPA via Kwargs#41692

Merged
vasqu merged 3 commits intohuggingface:mainfrom
vasqu:causal-kwarg-sdpa
Oct 17, 2025
Merged

[Attn] Allow dynamic causality in SDPA via Kwargs#41692
vasqu merged 3 commits intohuggingface:mainfrom
vasqu:causal-kwarg-sdpa

Conversation

@vasqu
Copy link
Copy Markdown
Contributor

@vasqu vasqu commented Oct 17, 2025

As per title, it's

This allows us to rely on the set module's attribute per default but overwrite with kwarg if given. cc @zucchini-nlp

scaling: Optional[float] = None,
sliding_window: Optional[int] = None,
softcap: Optional[float] = None,
is_causal: Optional[bool] = None,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imo, it's nicer to define this as explicit kwarg here instead of doing kwarg.get...

Comment thread src/transformers/integrations/sdpa_attention.py
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Copy Markdown
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functionality-wise looks good to me. I think we can delete now passing it explicitly as in

is_causal=self.is_causal,
and fix models like CLIP that change self attributes at run-time. Searching shows a few models with similar pattern

Copy link
Copy Markdown
Member

@Cyrilvallez Cyrilvallez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks for improving! Just 2 small comments!

Comment on lines +71 to +73
# Kwarg takes precedence over the defined module's attribute
# - Allows dynamic switching, e.g. when model's switch based on the model input type (CLIP)
# - Defaults to "normal" behavior for all attention types (encoder, decoder, cross)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this comment looks a bit complicated to me, a simple "we give precedence to kwarg, then module if not present" as for fa would be clearer IMO, but no strong opinion - feel free to disregard if you think otherwise!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lol, yea fair enough updating with the fa version in a second

Comment thread src/transformers/integrations/sdpa_attention.py Outdated
vasqu and others added 2 commits October 17, 2025 17:39
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
@vasqu
Copy link
Copy Markdown
Contributor Author

vasqu commented Oct 17, 2025

#41692 (review) cc @yonigozlan @molbap for vision models refactors to keep in mind!

@vasqu vasqu enabled auto-merge (squash) October 17, 2025 15:42
@vasqu vasqu merged commit 7e204ad into huggingface:main Oct 17, 2025
22 checks passed
@vasqu vasqu deleted the causal-kwarg-sdpa branch October 17, 2025 15:52
ngazagna-qc pushed a commit to ngazagna-qc/transformers that referenced this pull request Oct 23, 2025
* is causal as kwarg

* Update src/transformers/integrations/sdpa_attention.py

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* fix comment

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
SangbumChoi pushed a commit to SangbumChoi/transformers that referenced this pull request Jan 23, 2026
* is causal as kwarg

* Update src/transformers/integrations/sdpa_attention.py

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* fix comment

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants