Fix GraniteMoeHybrid _update_mamba_mask crash on attention-only models by tianhaocui · Pull Request #45514 · huggingface/transformers

tianhaocui · 2026-04-19T10:27:36Z

Summary

GraniteMoeHybridModel._update_mamba_mask calls past_key_values.has_previous_state() without checking whether the model actually has mamba layers. When all layers are attention-only (no mamba layers in config.layers_block_type), has_previous_state() fails to find a LinearAttentionCacheLayerMixin layer and raises ValueError.

Fix

Check config.layers_block_type for mamba layers before calling has_previous_state(). If no mamba layers exist, return the attention mask as-is since the mamba mask optimization is irrelevant.

Applied to both modeling_granitemoehybrid.py and modular_granitemoehybrid.py.

When all layers are attention layers (no mamba layers), _update_mamba_mask calls past_key_values.has_previous_state() which tries to find a LinearAttentionCacheLayerMixin layer. Since none exist, it raises ValueError. Skip the has_previous_state check entirely when the model has no mamba layers, as the mamba mask optimization is irrelevant in that case. Fixes huggingface#45507

Rocketknight1 · 2026-04-21T12:16:38Z

cc @vasqu maybe since you volunteered!

vasqu

Please see my comment and also add a small test for granite moe hybrid in this case

Instead of guarding inside _update_mamba_mask, skip the call entirely in forward() when no mamba layers exist. This keeps _update_mamba_mask focused on its original responsibility and avoids calling it on attention-only models altogether. Signed-off-by: root <root@hk760245497450.local>

tianhaocui · 2026-04-23T16:04:09Z

Thanks for the suggestion @vasqu — you're right that the guard belongs at the call site rather than inside _update_mamba_mask.

Updated in ec7cd01: the check is now in forward(), skipping _update_mamba_mask entirely when no mamba layers exist. I kept the two-variable structure (causal_mask + mamba_mask) instead of a dict since there are only two mask types and it stays consistent with the existing loop at L253.

_update_mamba_mask is reverted to its original form — it no longer needs to know about layer types.

vasqu · 2026-04-23T16:06:06Z

Let's add a fast test as well please; should be easy by forcing the layer types on construction of the dummy model

Verifies that GraniteMoeHybrid models with all attention layers (no mamba layers) can run forward without crashing. Regression test for huggingface#45507. Signed-off-by: root <root@hk760245497450.local>

tianhaocui · 2026-04-23T16:28:50Z

Added a fast test in 89ef90c — test_attention_only_forward constructs a model with layers_block_type set to all "attention" (no mamba layers) and runs a forward pass to verify it doesn't crash. This covers the regression from #45507.

github-actions · 2026-04-27T12:11:41Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: granitemoehybrid

vasqu

Thanks, I only changed the logic in the model a bit to be closer to hybrid attention patterns - checking with run-slow in a sec, if everything passes / doesn't change, I merge

vasqu · 2026-04-27T12:12:27Z

run-slow: granitemoehybrid

github-actions · 2026-04-27T12:13:44Z

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/granitemoehybrid"]
quantizations: []

github-actions · 2026-04-27T12:25:44Z

CI Results

Workflow Run ⚙️

Commit Info

Context	Commit	Description
RUN	1000bf90	workflow commit (merge commit)
PR	9d046317	branch commit (from PR)
main	bbb51c83	base commit (on `main`)

✅ No failing test specific to this PR 🎉 👏 !

HuggingFaceDocBuilderDev · 2026-04-27T12:38:47Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

#45514) * Fix GraniteMoeHybrid _update_mamba_mask crash on attention-only models When all layers are attention layers (no mamba layers), _update_mamba_mask calls past_key_values.has_previous_state() which tries to find a LinearAttentionCacheLayerMixin layer. Since none exist, it raises ValueError. Skip the has_previous_state check entirely when the model has no mamba layers, as the mamba mask optimization is irrelevant in that case. Fixes #45507 * Move mamba layer guard to forward() caller per review feedback Instead of guarding inside _update_mamba_mask, skip the call entirely in forward() when no mamba layers exist. This keeps _update_mamba_mask focused on its original responsibility and avoids calling it on attention-only models altogether. Signed-off-by: root <root@hk760245497450.local> * Add fast test for attention-only model forward pass Verifies that GraniteMoeHybrid models with all attention layers (no mamba layers) can run forward without crashing. Regression test for #45507. Signed-off-by: root <root@hk760245497450.local> * fixup closer to hybrid attentions --------- Signed-off-by: root <root@hk760245497450.local> Co-authored-by: root <root@hk760245497450.local> Co-authored-by: vasqu <antonprogamer@gmail.com>

vasqu reviewed Apr 23, 2026

View reviewed changes

Comment thread src/transformers/models/granitemoehybrid/modular_granitemoehybrid.py Outdated

Add fast test for attention-only model forward pass

89ef90c

Verifies that GraniteMoeHybrid models with all attention layers (no mamba layers) can run forward without crashing. Regression test for huggingface#45507. Signed-off-by: root <root@hk760245497450.local>

fixup closer to hybrid attentions

9d04631

vasqu approved these changes Apr 27, 2026

View reviewed changes

vasqu added this pull request to the merge queue Apr 27, 2026

Merged via the queue into huggingface:main with commit 617e752 Apr 27, 2026
22 checks passed

evalstate mentioned this pull request Apr 28, 2026

Cumulative defect fixes from recent Transformers PRs evalstate/transformers#41

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix GraniteMoeHybrid _update_mamba_mask crash on attention-only models#45514

Fix GraniteMoeHybrid _update_mamba_mask crash on attention-only models#45514
vasqu merged 4 commits intohuggingface:mainfrom
tianhaocui:fix-granitemoehybrid-mamba-mask

tianhaocui commented Apr 19, 2026

Uh oh!

Rocketknight1 commented Apr 21, 2026

Uh oh!

vasqu left a comment

Uh oh!

Uh oh!

tianhaocui commented Apr 23, 2026

Uh oh!

vasqu commented Apr 23, 2026

Uh oh!

tianhaocui commented Apr 23, 2026

Uh oh!

github-actions Bot commented Apr 27, 2026

Uh oh!

vasqu left a comment

Uh oh!

vasqu commented Apr 27, 2026

Uh oh!

github-actions Bot commented Apr 27, 2026

Uh oh!

github-actions Bot commented Apr 27, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

tianhaocui commented Apr 19, 2026

Summary

Fix

Uh oh!

Rocketknight1 commented Apr 21, 2026

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tianhaocui commented Apr 23, 2026

Uh oh!

vasqu commented Apr 23, 2026

Uh oh!

tianhaocui commented Apr 23, 2026

Uh oh!

github-actions Bot commented Apr 27, 2026

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

vasqu commented Apr 27, 2026

Uh oh!

github-actions Bot commented Apr 27, 2026

Uh oh!

github-actions Bot commented Apr 27, 2026

CI Results

Commit Info

Uh oh!

HuggingFaceDocBuilderDev commented Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants