Add GraniteMoeHybrid support for 4.0 by Ssukriti · Pull Request #37658 · huggingface/transformers

Ssukriti · 2025-04-21T15:43:06Z

What does this PR do?

The PR adds support for upcoming Granite4.0 models. It terms of model architecture, it is a hybrid class with shared MLP layer and Bamba layers.

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Rocketknight1 · 2025-04-22T12:22:00Z

cc @ArthurZucker for text models!

vasqu · 2025-04-22T15:11:54Z

+class GraniteMoeHybridSdpaAttention(GraniteMoeSharedSdpaAttention):
+    pass
+
+GRANITEMOEHYBRID_ATTENTION_CLASSES = {


Just as a heads up, I think it would be nice to follow using the new attention interface (see #35235 for the original PR). Llama can also provide a good first pointer for this, e.g.

transformers/src/transformers/models/llama/modeling_llama.py

Line 216 in de182ba

class LlamaAttention(nn.Module):

(Except I'm missing that this is a more special kind of attention here :D )

Thanks for the heads up @vasqu! We are still cleaning up this branch a bit, will take a look at this once the tests are in a better state 🙂

Thanks for the pointer @vasqu! Refactored this PR to the new attention interface 😄

vasqu · 2025-04-22T15:13:13Z

ccing @molbap for mamba2/bamba (feels like I'm pinging you constantly 😆)

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

alex-jw-brooks · 2025-05-01T00:16:57Z

Thanks @ArthurZucker! It's ready for another look when you get the chance!

ArthurZucker

Very nice use of modular thanks a lot! 🤗

ArthurZucker · 2025-05-01T14:24:50Z

+
+        hidden_states = self.input_layernorm(hidden_states)
+        self_attn_weights = None
+        if self.layer_type == "mamba":


I am thinking let's remove the check on type, rely rather on the check of self.self_attn is not None?

I agree, I also didn't like self.mamba being conditionally undefined. Updated this to define both in __init__ and just check do mamba if self.mamba is not None and attention otherwise 🙂

ArthurZucker · 2025-05-01T14:24:57Z

+        else:
+            raise ValueError(f"Expected layer type in ['attention', 'mamba'], got {self.layer_type}")


still todo 😉

ArthurZucker · 2025-05-01T14:25:29Z

+        hidden_states = self.post_attention_layernorm(hidden_states)
+        moe_hidden_states, router_logits = self.block_sparse_moe(hidden_states)
+
+        if self.shared_mlp is None:


I don't know if you answered or not, is there two different checkpoint being released, one with / one without this?

The models that are about to come out do use it! I think there are likely experiments ongoing without it, but am not sure about concrete plans for when they'll be released since I'm not the one training the models 🙂

In that case lets remove what's uncertain! 🤗

Sounds good! Removed the case with 0 experts, I'll open a follow-up PR if it ends up being used in a model to be released 😄

ArthurZucker · 2025-05-01T14:25:53Z

+            if self.gradient_checkpointing and self.training:
+                layer_outputs = self._gradient_checkpointing_func(
+                    decoder_layer.__call__,
+                    hidden_states,
+                    layer_mask,
+                    past_key_values,
+                    output_attentions,
+                    use_cache,
+                    cache_position,
+                    output_router_logits,
+                    position_embeddings,
+                )
+            else:
+                layer_outputs = decoder_layer(
+                    hidden_states,
+                    attention_mask=layer_mask,
+                    past_key_value=past_key_values,
+                    output_attentions=output_attentions,
+                    use_cache=use_cache,
+                    cache_position=cache_position,
+                    output_router_logits=output_router_logits,
+                    position_embeddings=position_embeddings,
+                )


let's use the new GradientCHeckpointingLayer wdyt?

Definitely, that is a lot cleaner! I updated the models in the chain for modular to all use the gradient checkpointing layer (GraniteMoe/GraniteMoeShared/GraniteMoeHybrid)

ArthurZucker · 2025-05-01T14:26:15Z

+        if not return_dict:
+            return tuple(v for v in [hidden_states, next_cache, all_hidden_states, all_self_attns] if v is not None)


we have a @can_return_tuple for the forward

Suggested change

if not return_dict:

return tuple(v for v in [hidden_states, next_cache, all_hidden_states, all_self_attns] if v is not None)

Nice! Added

ArthurZucker · 2025-05-01T14:27:10Z

+    )
+
+
+class GraniteMoeHybridModelTester:


can we try to inherit tests from closes model so mambda in the same fashion as here

transformers/tests/models/gemma2/test_modeling_gemma2.py

Line 51 in 86d38f1

class Gemma2ModelTester(GemmaModelTester):

Good idea! The closest models are for the tests are bamba. Consolidated a bit to use Bamba tests, should be way easier to look at now 🤞

berserkr

transformers/src/transformers/models/granitemoehybrid/modeling_granitemoehybrid.py

Line 1199 in 8274d2c

std = self.config.initializer_range

- std initialized twice - std = self.config.initializer_range

align test init delete more tests Use common layer init with bamba tests finish test consolidation

alex-jw-brooks · 2025-05-01T21:56:38Z

Thanks @berserkr! There were two because of modular expanding the superclass implementation that also set it. Updated to just pass the config value directly so it's less weird looking 🙂

alex-jw-brooks · 2025-05-01T22:17:28Z

Thank you very much for the fast review @ArthurZucker! I've made all the changes 🙂

ArthurZucker

Marvelous ! Merging once the build PR passes! (should be easy to fix!)

alex-jw-brooks · 2025-05-02T14:14:21Z

Thanks @ArthurZucker! Added the missing TOC entry and removed the currently unused shared condition for the MLP, should pass now! 🤞

* initial config and MLA layer Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * first pass at decoder Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * completion of layers Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * modeling class Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * adding hybrid class to imports Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fix imports granitemoehybrid Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fix granitehybrid imports Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fix granitehybrid import Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fix generated modeling file Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * add some comments Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * minor fixes in layers Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * add sharedMLP layer Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * correct layer names Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fixes in mamba config Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fix mamba config Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * change name of MLP layer Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fix seq mizer layers Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * correct mamba config Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fixes in param names Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * enable hybrid model Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * update config Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fix config granite hybrid Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fix attention layer Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * cleanup to re-use mamba code Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * keep layer types Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * attention bias cleanup Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * update mamba layer name Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * first pass at tests Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * first pass at tests Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * use granite attention Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fix: self attn weights Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * pass at making pos_emb optional Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * initialize self_attn only as needed Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * overwrite forward to create HybridMambaCache Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * Log invalid layer types * Add attention outputs test * Only emit attentions/logits if not None * Fix config test hidden size divisibility * mark granitmoehybrid as stateful * Initialize mamba convolutional layers * Formatting fixes * config docstring, removed some unused attrs * Fix missing arg in models test * Fix create and check decoder model test * support logits to keep in granitemoe * regen to pass logits_to_keep * Allow None or rope * Fix gradient checkpointing * Add granitemoehybrid as special cache for generate check * Remove unused MLA refs * Fix mamba layer mask * Remove logits to keep from config * Minor docstring nits * Update licenses * Enable cache by default * map layer types to layer block type * First pass at granite moe hybrid docs * Ignore granite moe hybrid in valid checkpoint check * Align attention interfaces * regenerate modular granitemoeshared attention interface * Align granite moe hybrid attn interface * run formatting * Handle mamba initialization * avoid conditional attr defs * Move hybrid layer validation to config * Add placeholder integration tests * Docs nits / Update model names * Clean up forward conditions * Use gradient checkpointing layer * Remove some copied bamba tests + inherit align test init delete more tests Use common layer init with bamba tests finish test consolidation * avoid redundant intermediate std var * use @can_return_tuple * Remove unused moe state * make skipped test names consistent * Fix docstring order * Add missing toc * Always create the shared mlp * Fix name in docstring * link preview model in docs --------- Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> Co-authored-by: Alex-Brooks <Alex.Brooks@ibm.com>

vasqu reviewed Apr 22, 2025

View reviewed changes

alex-jw-brooks force-pushed the granitemoe_hybrid_external_cleanup branch 5 times, most recently from ac9b018 to d751d26 Compare April 29, 2025 22:52

s3woz mentioned this pull request Apr 30, 2025

[Model] Add GraniteMoeHybrid 4.0 model vllm-project/vllm#17461

Closed

Ssukriti added 21 commits April 30, 2025 12:53

initial config and MLA layer

1e92e6d

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

first pass at decoder

c4f8051

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

completion of layers

4966ac1

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

modeling class

721645a

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

adding hybrid class to imports

570147a

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

fix imports granitemoehybrid

3fcb2bf

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

fix granitehybrid imports

1d5b29e

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

fix granitehybrid import

ff2f4e0

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

fix generated modeling file

a69ef3d

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

add some comments

846a507

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

minor fixes in layers

69c061e

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

add sharedMLP layer

d5e310c

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

correct layer names

e7bad48

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

fixes in mamba config

5d5a87a

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

fix mamba config

711fc62

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

change name of MLP layer

8177217

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

fix seq mizer layers

9790fbe

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

correct mamba config

3151198

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

fixes in param names

e9c145a

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

enable hybrid model

c32b8b0

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

update config

018decd

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

s3woz mentioned this pull request Apr 30, 2025

[Model] Add GraniteMoeHybrid 4.0 model vllm-project/vllm#17497

Merged

Docs nits / Update model names

8274d2c

alex-jw-brooks force-pushed the granitemoe_hybrid_external_cleanup branch from a70d949 to 8274d2c Compare April 30, 2025 22:40

Ssukriti marked this pull request as ready for review April 30, 2025 23:43

ArthurZucker approved these changes May 1, 2025

View reviewed changes

alex-jw-brooks added 2 commits May 1, 2025 17:10

Clean up forward conditions

d43ffcf

Use gradient checkpointing layer

c6679ab

berserkr reviewed May 1, 2025

View reviewed changes

alex-jw-brooks added 4 commits May 1, 2025 21:02

Remove some copied bamba tests + inherit

f4dca66

align test init delete more tests Use common layer init with bamba tests finish test consolidation

avoid redundant intermediate std var

1672a75

use @can_return_tuple

74ef853

Remove unused moe state

f12c856

make skipped test names consistent

156a682

Fix docstring order

91f176e

ArthurZucker approved these changes May 2, 2025

View reviewed changes

alex-jw-brooks added 3 commits May 2, 2025 13:40

Add missing toc

ff7dae2

Always create the shared mlp

d9cf0cc

Fix name in docstring

1c0272a

alex-jw-brooks force-pushed the granitemoe_hybrid_external_cleanup branch from 6b0ba0c to 1c0272a Compare May 2, 2025 14:04

gabe-l-hart mentioned this pull request May 2, 2025

Feature Request: Granite 4 Support ggml-org/llama.cpp#13275

Closed

16 tasks

alex-jw-brooks mentioned this pull request May 5, 2025

Add support for IBM Granite-4.0-Tiny-Preview ollama/ollama#10557

Open

link preview model in docs

bd3081a

ArthurZucker merged commit 471958b into huggingface:main May 6, 2025
18 checks passed

donpellegrino mentioned this pull request May 16, 2025

Add support for IBM Granite 4.0 Tiny Preview tracel-ai/models#71

Open

gabe-l-hart mentioned this pull request Dec 8, 2025

granitemoehybrid forward(): lots of logits upcast to float32, eating masive VRAM for minimal gain #42709

Closed

4 tasks

		else:
		raise ValueError(f"Expected layer type in ['attention', 'mamba'], got {self.layer_type}")

		if not return_dict:
		return tuple(v for v in [hidden_states, next_cache, all_hidden_states, all_self_attns] if v is not None)

Conversation

Ssukriti commented Apr 21, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

Rocketknight1 commented Apr 22, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vasqu commented Apr 22, 2025

Uh oh!

alex-jw-brooks commented May 1, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alex-jw-brooks May 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

berserkr left a comment

Choose a reason for hiding this comment

Uh oh!

alex-jw-brooks commented May 1, 2025

Uh oh!

alex-jw-brooks commented May 1, 2025

Uh oh!

ArthurZucker left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alex-jw-brooks commented May 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

alex-jw-brooks May 1, 2025 •

edited

Loading

ArthurZucker left a comment •

edited

Loading