Skip to content

Fix GraniteMoeHybrid _update_mamba_mask crash on attention-only models#45514

Merged
vasqu merged 4 commits intohuggingface:mainfrom
tianhaocui:fix-granitemoehybrid-mamba-mask
Apr 27, 2026
Merged

Fix GraniteMoeHybrid _update_mamba_mask crash on attention-only models#45514
vasqu merged 4 commits intohuggingface:mainfrom
tianhaocui:fix-granitemoehybrid-mamba-mask

Conversation

@tianhaocui
Copy link
Copy Markdown
Contributor

Fixes #45507

Summary

GraniteMoeHybridModel._update_mamba_mask calls past_key_values.has_previous_state() without checking whether the model actually has mamba layers. When all layers are attention-only (no mamba layers in config.layers_block_type), has_previous_state() fails to find a LinearAttentionCacheLayerMixin layer and raises ValueError.

Fix

Check config.layers_block_type for mamba layers before calling has_previous_state(). If no mamba layers exist, return the attention mask as-is since the mamba mask optimization is irrelevant.

Applied to both modeling_granitemoehybrid.py and modular_granitemoehybrid.py.

When all layers are attention layers (no mamba layers),
_update_mamba_mask calls past_key_values.has_previous_state() which
tries to find a LinearAttentionCacheLayerMixin layer. Since none
exist, it raises ValueError.

Skip the has_previous_state check entirely when the model has no
mamba layers, as the mamba mask optimization is irrelevant in that
case.

Fixes huggingface#45507
@Rocketknight1
Copy link
Copy Markdown
Member

cc @vasqu maybe since you volunteered!

Copy link
Copy Markdown
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my comment and also add a small test for granite moe hybrid in this case

Comment thread src/transformers/models/granitemoehybrid/modular_granitemoehybrid.py Outdated
Instead of guarding inside _update_mamba_mask, skip the call entirely
in forward() when no mamba layers exist. This keeps _update_mamba_mask
focused on its original responsibility and avoids calling it on
attention-only models altogether.

Signed-off-by: root <root@hk760245497450.local>
@tianhaocui
Copy link
Copy Markdown
Contributor Author

Thanks for the suggestion @vasqu — you're right that the guard belongs at the call site rather than inside _update_mamba_mask.

Updated in ec7cd01: the check is now in forward(), skipping _update_mamba_mask entirely when no mamba layers exist. I kept the two-variable structure (causal_mask + mamba_mask) instead of a dict since there are only two mask types and it stays consistent with the existing loop at L253.

_update_mamba_mask is reverted to its original form — it no longer needs to know about layer types.

@vasqu
Copy link
Copy Markdown
Contributor

vasqu commented Apr 23, 2026

Let's add a fast test as well please; should be easy by forcing the layer types on construction of the dummy model

Verifies that GraniteMoeHybrid models with all attention layers
(no mamba layers) can run forward without crashing. Regression
test for huggingface#45507.

Signed-off-by: root <root@hk760245497450.local>
@tianhaocui
Copy link
Copy Markdown
Contributor Author

Added a fast test in 89ef90ctest_attention_only_forward constructs a model with layers_block_type set to all "attention" (no mamba layers) and runs a forward pass to verify it doesn't crash. This covers the regression from #45507.

@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: granitemoehybrid

Copy link
Copy Markdown
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I only changed the logic in the model a bit to be closer to hybrid attention patterns - checking with run-slow in a sec, if everything passes / doesn't change, I merge

@vasqu
Copy link
Copy Markdown
Contributor

vasqu commented Apr 27, 2026

run-slow: granitemoehybrid

@github-actions
Copy link
Copy Markdown
Contributor

Workflow Run ⚙️

This comment contains run-slow, running the specified jobs:

models: ["models/granitemoehybrid"]
quantizations: []

@github-actions
Copy link
Copy Markdown
Contributor

CI Results

Workflow Run ⚙️

Commit Info

Context Commit Description
RUN 1000bf90 workflow commit (merge commit)
PR 9d046317 branch commit (from PR)
main bbb51c83 base commit (on main)

✅ No failing test specific to this PR 🎉 👏 !

@vasqu vasqu added this pull request to the merge queue Apr 27, 2026
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Merged via the queue into huggingface:main with commit 617e752 Apr 27, 2026
22 checks passed
ArthurZucker pushed a commit that referenced this pull request Apr 28, 2026
#45514)

* Fix GraniteMoeHybrid _update_mamba_mask crash on attention-only models

When all layers are attention layers (no mamba layers),
_update_mamba_mask calls past_key_values.has_previous_state() which
tries to find a LinearAttentionCacheLayerMixin layer. Since none
exist, it raises ValueError.

Skip the has_previous_state check entirely when the model has no
mamba layers, as the mamba mask optimization is irrelevant in that
case.

Fixes #45507

* Move mamba layer guard to forward() caller per review feedback

Instead of guarding inside _update_mamba_mask, skip the call entirely
in forward() when no mamba layers exist. This keeps _update_mamba_mask
focused on its original responsibility and avoids calling it on
attention-only models altogether.

Signed-off-by: root <root@hk760245497450.local>

* Add fast test for attention-only model forward pass

Verifies that GraniteMoeHybrid models with all attention layers
(no mamba layers) can run forward without crashing. Regression
test for #45507.

Signed-off-by: root <root@hk760245497450.local>

* fixup closer to hybrid attentions

---------

Signed-off-by: root <root@hk760245497450.local>
Co-authored-by: root <root@hk760245497450.local>
Co-authored-by: vasqu <antonprogamer@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GraniteMoEHybrid Model Calls Invalid Method

4 participants