Fix Mamba2ForCausalLM weight tying by Anri-Lombard · Pull Request #43207 · huggingface/transformers

Anri-Lombard · 2026-01-10T06:21:10Z

What does this PR do?

Adds the _tied_weights_keys mapping to Mamba2ForCausalLM to enable proper weight tying when tie_word_embeddings=True.

The Bug

When tie_word_embeddings=True, the embedding weights should be shared with the lm_head. However, Mamba2ForCausalLM had:

_tied_weights_keys = {}  # Empty - weight tying never happens

This caused:

Weights not actually tied (different tensors)
resize_token_embeddings() not resizing lm_head properly

The Fix

_tied_weights_keys = {"lm_head.weight": "backbone.embeddings.weight"}

This is the standard pattern used by MambaForCausalLM (v1), GPT2, LLaMA, and other models.

Verification

from transformers import Mamba2ForCausalLM, Mamba2Config

config = Mamba2Config(vocab_size=1000, tie_word_embeddings=True)
model = Mamba2ForCausalLM(config)

# Weights are now properly tied
assert model.lm_head.weight.data_ptr() == model.backbone.embeddings.weight.data_ptr()

# Resize works correctly
model.resize_token_embeddings(1100)
assert model.lm_head.weight.shape[0] == 1100

Add _tied_weights_keys mapping to enable proper weight tying when tie_word_embeddings=True. This is the standard pattern used by MambaForCausalLM, GPT2, LLaMA, and other models. Fixes huggingface#43206

vasqu

Can you add a fast test as regression test

Anri-Lombard · 2026-01-12T15:08:16Z

Thanks! Enabled tie_word_embeddings=True in the ModelTester to run the standard weight tying regression test. Are you happy with that? 🙏

vasqu · 2026-01-12T15:28:24Z

Can we instead create a small test for this? It seems that the original model did not have tied weights so this is more of an addition IMO and we check this as an explicit test (would be nice to link / mention your issue as well)

Replace ModelTester default with explicit test per reviewer feedback.

github-actions · 2026-01-22T17:49:47Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: mamba2

Anri-Lombard · 2026-01-22T17:52:10Z

@vasqu Added an explicit regression test that checks both tie_word_embeddings=True and tie_word_embeddings=False 🙏

HuggingFaceDocBuilderDev · 2026-01-22T18:30:52Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

vasqu

Thank you, just checking with run-slow once just to be paranoid safe then merging :D

vasqu · 2026-01-22T18:36:58Z

            check_equivalence(model, tuple_inputs, dict_inputs, {"output_hidden_states": True})

+    def test_tied_weight_embeddings(self):
+        """Regression test for https://github.com/huggingface/transformers/issues/43206."""


Awesome, thanks for linking 🙏

vasqu · 2026-01-22T18:37:41Z

run-slow: mamba2

github-actions · 2026-01-22T18:38:51Z

This comment contains run-slow, running the specified jobs:

models: ["models/mamba2"]
quantizations: []

github-actions · 2026-01-22T18:50:55Z

CI Results

Workflow Run ⚙️

✅ No failing test specific to this PR 🎉 !

* Fix Mamba2ForCausalLM weight tying Add _tied_weights_keys mapping to enable proper weight tying when tie_word_embeddings=True. This is the standard pattern used by MambaForCausalLM, GPT2, LLaMA, and other models. Fixes huggingface#43206 * Enable weight tying in Mamba2ModelTester for regression testing * Add explicit regression test for Mamba2 weight tying Replace ModelTester default with explicit test per reviewer feedback.

Fix Mamba2ForCausalLM weight tying

d596364

Add _tied_weights_keys mapping to enable proper weight tying when tie_word_embeddings=True. This is the standard pattern used by MambaForCausalLM, GPT2, LLaMA, and other models. Fixes huggingface#43206

Anri-Lombard force-pushed the fix-mamba2-output-embeddings branch from 2a2a93b to d596364 Compare January 11, 2026 05:37

Anri-Lombard changed the title ~~Add get/set_output_embeddings to Mamba2ForCausalLM~~ Fix Mamba2ForCausalLM weight tying Jan 11, 2026

vasqu approved these changes Jan 12, 2026

View reviewed changes

Enable weight tying in Mamba2ModelTester for regression testing

c862667

AJSTYLE-lab mentioned this pull request Jan 14, 2026

[Bug][v5.0.0rc2] AutoTokenizer.from_pretrained fails on second run with AttributeError: 'PreTrainedConfig' object has no attribute 'get' (Qwen/Qwen3-4B-Instruct-2507) #43272

Closed

4 tasks

Add explicit regression test for Mamba2 weight tying

e6c7bba

Replace ModelTester default with explicit test per reviewer feedback.

vasqu approved these changes Jan 22, 2026

View reviewed changes

vasqu merged commit 10e97cd into huggingface:main Jan 22, 2026
20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Mamba2ForCausalLM weight tying#43207

Fix Mamba2ForCausalLM weight tying#43207
vasqu merged 3 commits intohuggingface:mainfrom
Anri-Lombard:fix-mamba2-output-embeddings

Anri-Lombard commented Jan 10, 2026 •

edited

Loading

Uh oh!

vasqu left a comment

Uh oh!

Anri-Lombard commented Jan 12, 2026

Uh oh!

vasqu commented Jan 12, 2026

Uh oh!

github-actions Bot commented Jan 22, 2026

Uh oh!

Anri-Lombard commented Jan 22, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Jan 22, 2026

Uh oh!

vasqu left a comment

Uh oh!

vasqu Jan 22, 2026

Uh oh!

vasqu commented Jan 22, 2026

Uh oh!

github-actions Bot commented Jan 22, 2026

Uh oh!

github-actions Bot commented Jan 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Anri-Lombard commented Jan 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

The Bug

The Fix

Verification

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

Anri-Lombard commented Jan 12, 2026

Uh oh!

vasqu commented Jan 12, 2026

Uh oh!

github-actions Bot commented Jan 22, 2026

Uh oh!

Anri-Lombard commented Jan 22, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Jan 22, 2026

Uh oh!

vasqu left a comment

Choose a reason for hiding this comment

Uh oh!

vasqu Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

vasqu commented Jan 22, 2026

Uh oh!

github-actions Bot commented Jan 22, 2026

Uh oh!

github-actions Bot commented Jan 22, 2026

CI Results

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Anri-Lombard commented Jan 10, 2026 •

edited

Loading