flash_paged: s_aux may not exist by pcuenca · Pull Request #40434 · huggingface/transformers

pcuenca · 2025-08-25T15:10:57Z

Some implementations (i.e.,
https://huggingface.co/kernels-community/vllm-flash-attn3) support an s_aux arg for attention sinks, but others
(https://huggingface.co/kernels-community/flash-attn) do not. If s_aux is present in the kwargs, we forward it, otherwise we don't.

The user will still get an error if they use a model like gpt-oss-20b with an implementation that does not support s_aux, but models that don't use it won't error out. For example, this is currently failing because we are sending s_aux: None in the dict. We get:

TypeError: flash_attn_varlen_func() got an unexpected keyword argument 's_aux'

Some implementations (i.e., https://huggingface.co/kernels-community/vllm-flash-attn3) support an `s_aux` arg for attention sinks, but others (https://huggingface.co/kernels-community/flash-attn) do not. If s_aux is present in the kwargs, we forward it, otherwise we don't. The user will still get an error if they use a model like gpt-oss-20b with an implementation that does not support `s_aux`, but models that don't use it won't error out. For example, [this is currently failing](https://github.com/huggingface/transformers/blob/399cd5c04b11ba3f740b4f76e8067326786405cc/examples/pytorch/continuous_batching.py#L16) because we are sending `s_aux: None` in the dict.

HuggingFaceDocBuilderDev · 2025-08-25T15:20:15Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker

absolutely! thanks for the fix

…5589) Guard `s_aux` cast in `flash_attention_forward` for sink-less models `flash_attention_forward` unconditionally called `s_aux.to(query.dtype)`, which crashed with `AttributeError: 'NoneType' object has no attribute 'to'` for models that don't use attention sinks (e.g. Gemma). Mirrors the parallel guard added in #40434 for `flash_paged.py`. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

pcuenca requested a review from ArthurZucker August 25, 2025 15:11

ArthurZucker approved these changes Aug 26, 2025

View reviewed changes

ArthurZucker merged commit 58cebc8 into main Aug 26, 2025
22 of 25 checks passed

ArthurZucker deleted the s_aux_opt branch August 26, 2025 11:16

jamesbraza mentioned this pull request Apr 23, 2026

integrations/flash_attention.py crashes with AttributeError on s_aux=None for sink-less models #45588

Closed

4 tasks

ghost mentioned this pull request Apr 23, 2026

fix #45588: guard s_aux against None in flash_attention_forward #45590

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

flash_paged: s_aux may not exist#40434

flash_paged: s_aux may not exist#40434
ArthurZucker merged 1 commit intomainfrom
s_aux_opt

pcuenca commented Aug 25, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Aug 25, 2025

Uh oh!

ArthurZucker left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

pcuenca commented Aug 25, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Aug 25, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants