Skip to content

Fix Mamba2ForCausalLM weight tying#43207

Merged
vasqu merged 3 commits intohuggingface:mainfrom
Anri-Lombard:fix-mamba2-output-embeddings
Jan 22, 2026
Merged

Fix Mamba2ForCausalLM weight tying#43207
vasqu merged 3 commits intohuggingface:mainfrom
Anri-Lombard:fix-mamba2-output-embeddings

Conversation

@Anri-Lombard
Copy link
Copy Markdown
Contributor

@Anri-Lombard Anri-Lombard commented Jan 10, 2026

What does this PR do?

Fixes #43206

Adds the _tied_weights_keys mapping to Mamba2ForCausalLM to enable proper weight tying when tie_word_embeddings=True.

The Bug

When tie_word_embeddings=True, the embedding weights should be shared with the lm_head. However, Mamba2ForCausalLM had:

_tied_weights_keys = {}  # Empty - weight tying never happens

This caused:

  • Weights not actually tied (different tensors)
  • resize_token_embeddings() not resizing lm_head properly

The Fix

_tied_weights_keys = {"lm_head.weight": "backbone.embeddings.weight"}

This is the standard pattern used by MambaForCausalLM (v1), GPT2, LLaMA, and other models.

Verification

from transformers import Mamba2ForCausalLM, Mamba2Config

config = Mamba2Config(vocab_size=1000, tie_word_embeddings=True)
model = Mamba2ForCausalLM(config)

# Weights are now properly tied
assert model.lm_head.weight.data_ptr() == model.backbone.embeddings.weight.data_ptr()

# Resize works correctly
model.resize_token_embeddings(1100)
assert model.lm_head.weight.shape[0] == 1100

Add _tied_weights_keys mapping to enable proper weight tying when
tie_word_embeddings=True. This is the standard pattern used by
MambaForCausalLM, GPT2, LLaMA, and other models.

Fixes huggingface#43206
@Anri-Lombard Anri-Lombard force-pushed the fix-mamba2-output-embeddings branch from 2a2a93b to d596364 Compare January 11, 2026 05:37
@Anri-Lombard Anri-Lombard changed the title Add get/set_output_embeddings to Mamba2ForCausalLM Fix Mamba2ForCausalLM weight tying Jan 11, 2026
Copy link
Copy Markdown
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a fast test as regression test

@Anri-Lombard
Copy link
Copy Markdown
Contributor Author

Thanks! Enabled tie_word_embeddings=True in the ModelTester to run the standard weight tying regression test. Are you happy with that? 🙏

@vasqu
Copy link
Copy Markdown
Contributor

vasqu commented Jan 12, 2026

Can we instead create a small test for this? It seems that the original model did not have tied weights so this is more of an addition IMO and we check this as an explicit test (would be nice to link / mention your issue as well)

Replace ModelTester default with explicit test per reviewer feedback.
@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: mamba2

@Anri-Lombard
Copy link
Copy Markdown
Contributor Author

@vasqu Added an explicit regression test that checks both tie_word_embeddings=True and tie_word_embeddings=False 🙏

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Copy Markdown
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, just checking with run-slow once just to be paranoid safe then merging :D

check_equivalence(model, tuple_inputs, dict_inputs, {"output_hidden_states": True})

def test_tied_weight_embeddings(self):
"""Regression test for https://github.com/huggingface/transformers/issues/43206."""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks for linking 🙏

@vasqu
Copy link
Copy Markdown
Contributor

vasqu commented Jan 22, 2026

run-slow: mamba2

@github-actions
Copy link
Copy Markdown
Contributor

This comment contains run-slow, running the specified jobs:

models: ["models/mamba2"]
quantizations: []

@github-actions
Copy link
Copy Markdown
Contributor

CI Results

Workflow Run ⚙️

✅ No failing test specific to this PR 🎉 !

@vasqu vasqu merged commit 10e97cd into huggingface:main Jan 22, 2026
20 checks passed
SangbumChoi pushed a commit to SangbumChoi/transformers that referenced this pull request Jan 23, 2026
* Fix Mamba2ForCausalLM weight tying

Add _tied_weights_keys mapping to enable proper weight tying when
tie_word_embeddings=True. This is the standard pattern used by
MambaForCausalLM, GPT2, LLaMA, and other models.

Fixes huggingface#43206

* Enable weight tying in Mamba2ModelTester for regression testing

* Add explicit regression test for Mamba2 weight tying

Replace ModelTester default with explicit test per reviewer feedback.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Mamba2ForCausalLM missing get_output_embeddings/set_output_embeddings breaks resize_token_embeddings

3 participants