Delay and probably avoid unnecessary graph breaks in _upad_input of modeling_flash_attention_utils.py by cyyever · Pull Request #41097 · huggingface/transformers

cyyever · 2025-09-23T10:51:04Z

What does this PR do?

It works by refactoring _get_unpad_data

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

Rocketknight1 · 2025-09-23T11:34:09Z

Cyrilvallez · 2025-09-23T12:46:16Z

Humm, I don't really see how this delays the graph break? Moreover, max_seqlen_in_batch_k is now a Tensor instead of int, which is wrong

cyyever · 2025-09-23T12:57:13Z

@Cyrilvallez It delays until query_length == kv_seq_len, then .item() is called, in other cases, .item() is avoided.

Cyrilvallez · 2025-09-29T12:38:41Z

Yes, but the function is called only if attention_mask is not None in _flash_attention_forward, in which case graph breaks is unavoidable if I'm not mistaken 🤔

cyyever · 2025-09-29T12:47:25Z

@Cyrilvallez 😭

cyyever marked this pull request as draft September 23, 2025 10:51

cyyever force-pushed the flash_attn branch 2 times, most recently from 8cf2ae6 to dc5f4fe Compare September 23, 2025 10:54

cyyever marked this pull request as ready for review September 23, 2025 10:57

github-actions Bot requested review from ArthurZucker and Rocketknight1 September 23, 2025 10:58

Remove syncing code

6ed5d5f

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

cyyever force-pushed the flash_attn branch from dc5f4fe to 6ed5d5f Compare September 23, 2025 11:07

cyyever changed the title ~~Avoid unnecessary graph breaks in _upad_input of modeling_flash_attention_utils.py~~ Delay and probably avoid unnecessary graph breaks in _upad_input of modeling_flash_attention_utils.py Sep 23, 2025

ArthurZucker removed request for ArthurZucker and Rocketknight1 February 11, 2026 12:31

This was referenced Apr 29, 2026

Cumulative feature and defect updates from recent Transformers PRs evalstate/transformers#42

Open

Cumulative defect fixes from recent Transformers PRs evalstate/transformers#43

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delay and probably avoid unnecessary graph breaks in _upad_input of modeling_flash_attention_utils.py#41097

Delay and probably avoid unnecessary graph breaks in _upad_input of modeling_flash_attention_utils.py#41097
cyyever wants to merge 1 commit intohuggingface:mainfrom
cyyever:flash_attn

cyyever commented Sep 23, 2025 •

edited

Loading

Uh oh!

Rocketknight1 commented Sep 23, 2025

Uh oh!

Cyrilvallez commented Sep 23, 2025

Uh oh!

cyyever commented Sep 23, 2025

Uh oh!

Cyrilvallez commented Sep 29, 2025

Uh oh!

cyyever commented Sep 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

cyyever commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

Rocketknight1 commented Sep 23, 2025

Uh oh!

Cyrilvallez commented Sep 23, 2025

Uh oh!

cyyever commented Sep 23, 2025

Uh oh!

Cyrilvallez commented Sep 29, 2025

Uh oh!

cyyever commented Sep 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cyyever commented Sep 23, 2025 •

edited

Loading