Fix mamba regression by manueldeprada · Pull Request #39728 · huggingface/transformers

manueldeprada · 2025-07-28T11:24:21Z

This fixes the sneaky regression introduced in #38086 causing loading errors for falcon_mamba:

 "line": "tests/models/falcon_mamba/test_modeling_falcon_mamba.py::FalconMambaModelTest::test_model_from_pretrained",
                "trace": "(line 2593)  RuntimeError: Error(s) in loading state_dict for FalconMambaMixer:"
            },
            {
                "line": "tests/models/falcon_mamba/test_modeling_falcon_mamba.py::FalconMambaIntegrationTests::test_batched_generation",
                "trace": "(line 2593)  RuntimeError: Error(s) in loading state_dict for FalconMambaMixer:"
            },
            {
                "line": "tests/models/falcon_mamba/test_modeling_falcon_mamba.py::FalconMambaIntegrationTests::test_generation_4bit",
                "trace": "(line 2593)  RuntimeError: Error(s) in loading state_dict for FalconMambaMixer:"
            },
            {
                "line": "tests/models/falcon_mamba/test_modeling_falcon_mamba.py::FalconMambaIntegrationTests::test_generation_fp16",
                "trace": "(line 2593)  RuntimeError: Error(s) in loading state_dict for FalconMambaMixer:"
            },
            {
                "line": "tests/models/falcon_mamba/test_modeling_falcon_mamba.py::FalconMambaIntegrationTests::test_generation_torch_compile",
                "trace": "(line 2593)  RuntimeError: Error(s) in loading state_dict for FalconMambaMixer:"

The gist of the problem is: modular forces the super().__init__ call to be on top of FalconMambaConfig. However, before modular rewrite, it was at the bottom, which was critical for the intermediate_size property from the config file to take effect.

Second bug fixed: tests/models/mamba/test_modeling_mamba.py::MambaIntegrationTests::test_compile_mamba_cache was failing due to a missplaced import.

HuggingFaceDocBuilderDev · 2025-07-28T11:37:21Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

manueldeprada · 2025-07-28T11:57:46Z

run-slow: falcon_mamba

github-actions · 2025-07-28T11:59:04Z

This comment contains run-slow, running the specified jobs:

models: ['models/falcon_mamba']
quantizations: [] ...

github-actions · 2025-07-28T14:29:13Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: falcon_mamba, mamba

manueldeprada · 2025-07-28T14:33:58Z

run-slow: falcon_mamba, mamba

github-actions · 2025-07-28T14:35:19Z

This comment contains run-slow, running the specified jobs:

models: ['models/falcon_mamba', 'models/mamba']
quantizations: [] ...

ydshieh · 2025-07-28T14:45:27Z

+if is_mambapy_available():
+    from mambapy.pscan import pscan
+else:
+    pscan = None


does this model require mambapy.pscan or it could work with pscan = None too?

it uses pscan when the fast path (mamby library) is available. Otherwise it defaults to a slow (python code) forward pass.

The import was at the top before, I moved it for debugging and it slipped in 😅 , now I am reverting the change.

sure.

(FYI, so for the test of this model, we are testing the slow path I think)

ydshieh · 2025-07-28T14:53:54Z

+        # This is needed since mamba overrides the intermediate_size attribute
+        self.intermediate_size = (
+            int(expand * self.hidden_size)
+            if kwargs.get("intermediate_size") is None
+            else kwargs.get("intermediate_size")
+        )


Could you explain this part a bit more for me 🙏 , but I believe you are right.

When using modular_converter, super.init() unravels MambaConfig.init which sets intermediate_size to int(expand * self.hidden_size), overriding any value passed via kwargs.

Before #38086, setting intermediate_size to int(expand * self.hidden_size) wasn't an issue because PretrainedConfig.__init__() was called last, and the kwargs value prevailed.

However, #38086 reversed that order due to modular_converter, which forces PretrainedConfig.__init__() to run first, thus overwriting the kwargsintermediate_size.

The new code explicitly assigns intermediate_size to ensure the kwargs value takes precedence again.

OK, I understand better. And here, since intermediate_size is stored in tiiuae/falcon-mamba-7b, during loading, it is passed as kwargs, and causing the issue.

Looks like this (issue) is something that would happen quite frequently and we have to be careful (as modular_converter force it as you mentioned)

One final nit question: do you know why we have a config (i.e. tiiuae/falcon-mamba-7b)

that have intermediate_size != int(expand * self.hidden_size)

sounds a bit strange

Yeah, we have to be careful. In general, it is counterintuitive to have values hardcoded in the config initialization. I think kwargs should always take precedence there.

So to me, the good fix would be to change MambaConfig. As for the question, it is just their design choice. hidden size was very big, and they just chose to make the intermediates and the convs smaller.

ydshieh

OK for me from the explanation, but better for a second review.

* fix mamba regression * fix compile test

fix mamba regression

a7abc0a

manueldeprada requested review from ydshieh and removed request for ydshieh July 28, 2025 11:24

manueldeprada added 2 commits July 28, 2025 13:54

fix

da44dc3

fix

8ef822f

huggingface deleted a comment from github-actions Bot Jul 28, 2025

manueldeprada requested a review from ydshieh July 28, 2025 12:03

fix compile test

29ffcc1

manueldeprada requested a review from gante July 28, 2025 14:38

ydshieh reviewed Jul 28, 2025

View reviewed changes

ydshieh approved these changes Jul 28, 2025

View reviewed changes

ArthurZucker approved these changes Jul 29, 2025

View reviewed changes

ArthurZucker added the for patch Tag issues / labels that should be included in the next patch label Jul 29, 2025

manueldeprada merged commit cf97f6c into huggingface:main Jul 29, 2025
20 of 21 checks passed

ArthurZucker pushed a commit that referenced this pull request Jul 29, 2025

Fix mamba regression (#39728)

ab2a609

* fix mamba regression * fix compile test

zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025

Fix mamba regression (huggingface#39728)

8d927c4

* fix mamba regression * fix compile test

zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025

Fix mamba regression (huggingface#39728)

19bec3f

* fix mamba regression * fix compile test

zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025

Fix mamba regression (huggingface#39728)

a7c4e39

* fix mamba regression * fix compile test

zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025

Fix mamba regression (huggingface#39728)

8614a8d

* fix mamba regression * fix compile test

zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025

Fix mamba regression (huggingface#39728)

8dd2e57

* fix mamba regression * fix compile test

zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025

Fix mamba regression (huggingface#39728)

eba682b

* fix mamba regression * fix compile test

zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025

Fix mamba regression (huggingface#39728)

6f38ef4

* fix mamba regression * fix compile test

Conversation

manueldeprada commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Jul 28, 2025

Uh oh!

manueldeprada commented Jul 28, 2025

Uh oh!

github-actions Bot commented Jul 28, 2025

Uh oh!

github-actions Bot commented Jul 28, 2025

Uh oh!

manueldeprada commented Jul 28, 2025

Uh oh!

github-actions Bot commented Jul 28, 2025

Uh oh!

ydshieh Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

manueldeprada Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ydshieh Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

ydshieh Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

manueldeprada Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ydshieh Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

ydshieh Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

manueldeprada Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

ydshieh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

manueldeprada commented Jul 28, 2025 •

edited

Loading

manueldeprada Jul 28, 2025 •

edited

Loading

manueldeprada Jul 28, 2025 •

edited

Loading