[Attn] Allow dynamic causality in SDPA via Kwargs#41692
[Attn] Allow dynamic causality in SDPA via Kwargs#41692vasqu merged 3 commits intohuggingface:mainfrom
Attn] Allow dynamic causality in SDPA via Kwargs#41692Conversation
| scaling: Optional[float] = None, | ||
| sliding_window: Optional[int] = None, | ||
| softcap: Optional[float] = None, | ||
| is_causal: Optional[bool] = None, |
There was a problem hiding this comment.
Imo, it's nicer to define this as explicit kwarg here instead of doing kwarg.get...
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
zucchini-nlp
left a comment
There was a problem hiding this comment.
Functionality-wise looks good to me. I think we can delete now passing it explicitly as in
and fix models like CLIP that change self attributes at run-time. Searching shows a few models with similar pattern
Cyrilvallez
left a comment
There was a problem hiding this comment.
Nice, thanks for improving! Just 2 small comments!
| # Kwarg takes precedence over the defined module's attribute | ||
| # - Allows dynamic switching, e.g. when model's switch based on the model input type (CLIP) | ||
| # - Defaults to "normal" behavior for all attention types (encoder, decoder, cross) |
There was a problem hiding this comment.
nit: this comment looks a bit complicated to me, a simple "we give precedence to kwarg, then module if not present" as for fa would be clearer IMO, but no strong opinion - feel free to disregard if you think otherwise!
There was a problem hiding this comment.
Lol, yea fair enough updating with the fa version in a second
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
|
#41692 (review) cc @yonigozlan @molbap for vision models refactors to keep in mind! |
* is causal as kwarg * Update src/transformers/integrations/sdpa_attention.py Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> * fix comment --------- Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
* is causal as kwarg * Update src/transformers/integrations/sdpa_attention.py Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com> * fix comment --------- Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
As per title, it's
This allows us to rely on the set module's attribute per default but overwrite with kwarg if given. cc @zucchini-nlp