Skip to content

Fix mamba regression#39728

Merged
manueldeprada merged 4 commits intohuggingface:mainfrom
manueldeprada:fix-mamba
Jul 29, 2025
Merged

Fix mamba regression#39728
manueldeprada merged 4 commits intohuggingface:mainfrom
manueldeprada:fix-mamba

Conversation

@manueldeprada
Copy link
Copy Markdown
Contributor

@manueldeprada manueldeprada commented Jul 28, 2025

This fixes the sneaky regression introduced in #38086 causing loading errors for falcon_mamba:

 "line": "tests/models/falcon_mamba/test_modeling_falcon_mamba.py::FalconMambaModelTest::test_model_from_pretrained",
                "trace": "(line 2593)  RuntimeError: Error(s) in loading state_dict for FalconMambaMixer:"
            },
            {
                "line": "tests/models/falcon_mamba/test_modeling_falcon_mamba.py::FalconMambaIntegrationTests::test_batched_generation",
                "trace": "(line 2593)  RuntimeError: Error(s) in loading state_dict for FalconMambaMixer:"
            },
            {
                "line": "tests/models/falcon_mamba/test_modeling_falcon_mamba.py::FalconMambaIntegrationTests::test_generation_4bit",
                "trace": "(line 2593)  RuntimeError: Error(s) in loading state_dict for FalconMambaMixer:"
            },
            {
                "line": "tests/models/falcon_mamba/test_modeling_falcon_mamba.py::FalconMambaIntegrationTests::test_generation_fp16",
                "trace": "(line 2593)  RuntimeError: Error(s) in loading state_dict for FalconMambaMixer:"
            },
            {
                "line": "tests/models/falcon_mamba/test_modeling_falcon_mamba.py::FalconMambaIntegrationTests::test_generation_torch_compile",
                "trace": "(line 2593)  RuntimeError: Error(s) in loading state_dict for FalconMambaMixer:"

The gist of the problem is: modular forces the super().__init__ call to be on top of FalconMambaConfig. However, before modular rewrite, it was at the bottom, which was critical for the intermediate_size property from the config file to take effect.

Second bug fixed: tests/models/mamba/test_modeling_mamba.py::MambaIntegrationTests::test_compile_mamba_cache was failing due to a missplaced import.

@manueldeprada manueldeprada requested review from ydshieh and removed request for ydshieh July 28, 2025 11:24
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@huggingface huggingface deleted a comment from github-actions Bot Jul 28, 2025
@huggingface huggingface deleted a comment from github-actions Bot Jul 28, 2025
@manueldeprada
Copy link
Copy Markdown
Contributor Author

run-slow: falcon_mamba

@github-actions
Copy link
Copy Markdown
Contributor

This comment contains run-slow, running the specified jobs:

models: ['models/falcon_mamba']
quantizations: [] ...

@manueldeprada manueldeprada requested a review from ydshieh July 28, 2025 12:03
@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: falcon_mamba, mamba

@manueldeprada
Copy link
Copy Markdown
Contributor Author

run-slow: falcon_mamba, mamba

@github-actions
Copy link
Copy Markdown
Contributor

This comment contains run-slow, running the specified jobs:

models: ['models/falcon_mamba', 'models/mamba']
quantizations: [] ...

@manueldeprada manueldeprada requested a review from gante July 28, 2025 14:38
if is_mambapy_available():
from mambapy.pscan import pscan
else:
pscan = None
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this model require mambapy.pscan or it could work with pscan = None too?

Copy link
Copy Markdown
Contributor Author

@manueldeprada manueldeprada Jul 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it uses pscan when the fast path (mamby library) is available. Otherwise it defaults to a slow (python code) forward pass.

The import was at the top before, I moved it for debugging and it slipped in 😅 , now I am reverting the change.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure.

(FYI, so for the test of this model, we are testing the slow path I think)

Comment on lines +195 to +200
# This is needed since mamba overrides the intermediate_size attribute
self.intermediate_size = (
int(expand * self.hidden_size)
if kwargs.get("intermediate_size") is None
else kwargs.get("intermediate_size")
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain this part a bit more for me 🙏 , but I believe you are right.

Copy link
Copy Markdown
Contributor Author

@manueldeprada manueldeprada Jul 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using modular_converter, super.init() unravels MambaConfig.init which sets intermediate_size to int(expand * self.hidden_size), overriding any value passed via kwargs.

Before #38086, setting intermediate_size to int(expand * self.hidden_size) wasn't an issue because PretrainedConfig.__init__() was called last, and the kwargs value prevailed.

However, #38086 reversed that order due to modular_converter, which forces PretrainedConfig.__init__() to run first, thus overwriting the kwargsintermediate_size.

The new code explicitly assigns intermediate_size to ensure the kwargs value takes precedence again.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I understand better. And here, since intermediate_size is stored in tiiuae/falcon-mamba-7b, during loading, it is passed as kwargs, and causing the issue.

Looks like this (issue) is something that would happen quite frequently and we have to be careful (as modular_converter force it as you mentioned)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One final nit question: do you know why we have a config (i.e. tiiuae/falcon-mamba-7b)

that have intermediate_size != int(expand * self.hidden_size)

sounds a bit strange

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we have to be careful. In general, it is counterintuitive to have values hardcoded in the config initialization. I think kwargs should always take precedence there.

So to me, the good fix would be to change MambaConfig. As for the question, it is just their design choice. hidden size was very big, and they just chose to make the intermediates and the convs smaller.

Copy link
Copy Markdown
Collaborator

@ydshieh ydshieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK for me from the explanation, but better for a second review.

@ArthurZucker ArthurZucker added the for patch Tag issues / labels that should be included in the next patch label Jul 29, 2025
@manueldeprada manueldeprada merged commit cf97f6c into huggingface:main Jul 29, 2025
20 of 21 checks passed
ArthurZucker pushed a commit that referenced this pull request Jul 29, 2025
* fix mamba regression

* fix compile test
zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
* fix mamba regression

* fix compile test
zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
* fix mamba regression

* fix compile test
zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
* fix mamba regression

* fix compile test
zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
* fix mamba regression

* fix compile test
zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
* fix mamba regression

* fix compile test
zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
* fix mamba regression

* fix compile test
zaristei pushed a commit to zaristei/transformers that referenced this pull request Sep 9, 2025
* fix mamba regression

* fix compile test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

for patch Tag issues / labels that should be included in the next patch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants