Skip to content

Fix GQA support in varlen attention by adding enable_gqa=True#2905

Closed
daniellepintz wants to merge 1 commit intomainfrom
dp/enable_gqa_varlen
Closed

Fix GQA support in varlen attention by adding enable_gqa=True#2905
daniellepintz wants to merge 1 commit intomainfrom
dp/enable_gqa_varlen

Conversation

@daniellepintz
Copy link
Copy Markdown
Contributor

@daniellepintz daniellepintz commented Apr 9, 2026

The attention refactor in #2761 moved GQA head expansion out of the attention modules, but the varlen_attn/varlen_attn_out calls were not updated to pass enable_gqa=True. This causes a ValueError when query and key/value have different numbers of heads (e.g. Hq=8, Hkv=4).

ValueError: Expect query and key/value to have the same number of heads but got Hq=4 and Hkv=2. Try setting enable_gqa=True for GQA.

Fixes both the core VarlenAttention (trainer path) and the RL PyTorchFlashAttentionImpl (vLLM generator path).

The attention refactor in #2761 moved GQA head expansion out of the
attention modules, but the varlen_attn/varlen_attn_out calls were not
updated to pass enable_gqa=True. This causes a ValueError when query
and key/value have different numbers of heads (e.g. Hq=8, Hkv=4).

Fixes both the core VarlenAttention (trainer path) and the RL
PyTorchFlashAttentionImpl (vLLM generator path).
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 9, 2026
@daniellepintz
Copy link
Copy Markdown
Contributor Author

Fixed in #2891

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/8gpu CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant