Pass packed boundary metadata to Qwen3.5 linear-attention fast kernels by sdharani91 · Pull Request #44867 · huggingface/transformers

sdharani91 · 2026-03-19T17:31:45Z

What does this PR do?

This PR fixes packed-sequence handling for the Qwen3.5 linear-attention fast path.

Before this change, Qwen3.5 produced different outputs for:
a padded representation of multiple sequences
a packed representation of the same sequences using reset position_ids

The issue was specific to the linear-attention fast path. Full-attention layers already respected packed boundaries through the shared masking logic, but the Qwen3.5 fast linear-attention path was not passing packed-boundary metadata into its kernels.

This PR fixes that by:

deriving packed boundary metadata from packed position_ids
passing seq_idx to the causal convolution fast path
passing cu_seqlens to the FLA gated-delta-rule fast path
The change is intentionally scoped to the Qwen3.5 fast path for packed prefill inputs. The slow fallback path is not changed in this PR.

How was this tested?

Manual validation:

Reproduced the bug before the fix on Qwen3.5 using a tiny local config with one full-attention layer and one linear-attention layer.
Compared:
padded inputs for multiple sequences
packed inputs for the same sequences with reset position_ids
Before the fix on the fast path:
allclose: False
max abs diff was about 8e-3
After the fix on the fast path:
the original 2-segment packed-vs-padded repro matches
a multi-segment packed-vs-padded repro also matches with max abs diff around 6e-8
Sanity checks:

Verified Qwen3.5 was using the fast kernels:
causal_conv1d_fn present: True
fla.ops.gated_delta_rule.chunk
fla.ops.gated_delta_rule.fused_recurrent
Verified a normal unpacked Qwen3.5 forward still works after the change.

Unit tests:
Added tests for the packed-metadata helper in tests/models/qwen3_5/test_modeling_qwen3_5.py, including:
simple packed input
multi-segment packed input
cases where packed metadata should be skipped, such as cached inputs or unsupported batch layouts

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[ Y] Did you read the contributor guideline,
Pull Request section?
[ Y] Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Support packed sequences for linear attention models (i.e. Qwen3.5) #44717 (comment)
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
[ Y] Did you write any new necessary tests?

Who can review?

@vasqu

…s for issue 44717

github-actions · 2026-03-19T17:32:53Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: qwen3_5

sdharani91 · 2026-03-26T21:02:58Z

Follow up draft PR: https://github.com/huggingface/transformers/pull/45034/changes#diff-6064941ca492d13e60e6c551ee54b967d803c2fdc75dc0751676563eb615ae63 based on comments from #44717 (comment)

Pass packed boundary metadata to Qwen3.5 linear-attention fast kernel…

c1a6df8

…s for issue 44717

Rocketknight1 added the Code agent slop label Mar 20, 2026

Rocketknight1 closed this Mar 20, 2026

This was referenced Mar 25, 2026

Support packed sequences for linear attention models (i.e. Qwen3.5) #44717

Open

Pass packed boundary metadata to Qwen3.5 linear-attention fast kernels from data collator #45034

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pass packed boundary metadata to Qwen3.5 linear-attention fast kernels#44867

Pass packed boundary metadata to Qwen3.5 linear-attention fast kernels#44867
sdharani91 wants to merge 1 commit intohuggingface:mainfrom
sdharani91:feature_packing_qwen

sdharani91 commented Mar 19, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Mar 19, 2026

Uh oh!

sdharani91 commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sdharani91 commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

How was this tested?

Before submitting

Who can review?

Uh oh!

github-actions Bot commented Mar 19, 2026

Uh oh!

sdharani91 commented Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sdharani91 commented Mar 19, 2026 •

edited

Loading