Skip to content

Correctly create tied key mapping in post_init, and dynamic tie weight#42270

Merged
Cyrilvallez merged 18 commits intomainfrom
dynamic-tie
Nov 21, 2025
Merged

Correctly create tied key mapping in post_init, and dynamic tie weight#42270
Cyrilvallez merged 18 commits intomainfrom
dynamic-tie

Conversation

@Cyrilvallez
Copy link
Copy Markdown
Member

@Cyrilvallez Cyrilvallez commented Nov 19, 2025

As we rely more and more in self.all_tied_weight_keys everywhere (i.e. the list of tied keys obtained during post_init) for multiple manipulations (device_map computation, cuda warmup, post-processing of from_pretrained...), it becomes very important that the (few) models containing regex patterns for their _tied_weights_keys mapping have the patterns expanded to fit in all_tied_weight_keys as well, instead of containing simple patterns that are skipped in different ways for all downstream application.
This PR fixes that, by expanding correctly at post_init time, so the mapping are correct params everywhere.
Also allows for recomputing this mapping in tie_weights dynamically, so that it is correct if calling tie_weights after having modified the config

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@Cyrilvallez Cyrilvallez changed the title Dynamic tie weight, and full mapping in post_init Correctly create tied key mapping in post_init, and dynamic tie weight Nov 19, 2025
Copy link
Copy Markdown
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing a bit of reprensentative doc! Let's take a t5 as example ? Or rtdetr? to ahve a complexe list

Comment thread src/transformers/modeling_utils.py Outdated
Comment thread src/transformers/modeling_utils.py Outdated
for prefix, submodule in self.named_modules():
if isinstance(submodule, PreTrainedModel):
# Will dynamically check the config if it has changed
submodel_tied_weights = submodule.get_expanded_tied_weights_keys(all_submodels=False)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't know if we really have to go the inheritance path here?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given that we do named_parameters afterwards

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in order to check the proper subconfig... No better way unfortunately as sometimes we cannot get the subconfig in a proper way

source_name = "^" + source_name
target_name = "^" + target_name
# In this case, the keys stored in `all_tied_weights_keys` are already correct
if not recompute_mapping:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to update with setter and getter for tie_words_embedding no?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, was already checked before!

@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: esm, hubert, idefics, openai, sew, sew_d, unispeech, unispeech_sat, wav2vec2, wavlm

Copy link
Copy Markdown
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for itiretaing ! i like that its explicit now!

@Cyrilvallez Cyrilvallez merged commit ce7a5e0 into main Nov 21, 2025
11 of 24 checks passed
@Cyrilvallez Cyrilvallez deleted the dynamic-tie branch November 21, 2025 16:02
@lugimzzz
Copy link
Copy Markdown

@Cyrilvallez @ArthurZucker
I have a concern: with the changes in this PR, a model like Qwen/Qwen3-0.6B — which is a tie-weight model — is now treated as a non–tie-weight model. In its config.json, "tie_word_embeddings": true is set, and lm_head.weight and embed_tokens.weight actually share identical values. However, after this PR, the model is no longer recognized as a tie-weight model.

As a result, when training Qwen/Qwen3-0.6B, the gradients update completely different parameters before and after this PR. I believe the original behavior is more consistent with the model’s intended design.
Reference model: https://huggingface.co/Qwen/Qwen3-0.6B

@Cyrilvallez
Copy link
Copy Markdown
Member Author

Hey @lugimzzz! You can check-out my answer to the same question here! Let me know if you want more clarification!

SangbumChoi pushed a commit to SangbumChoi/transformers that referenced this pull request Jan 23, 2026
huggingface#42270)

* add dynamic

* improve

* doc

* true dynamic

* everywhere

* improve

* fix

* more

* small fix

* small fix

* fix duplicates

* fix

* doc

* fix

* improve doc

* comment

* more doc

* style
lordaarush added a commit to lordaarush/transformers that referenced this pull request Feb 12, 2026
Fixes huggingface#43883

After huggingface#42270, all_tied_weights_keys is initialized in post_init(), but remote models loaded with trust_remote_code=True don't always call post_init() properly, causing AttributeError when loading models like Molmo.

This fix adds defensive checks in two methods:
- _adjust_tied_keys_with_tied_pointers(): Initialize empty dict if missing, then detect tied weights via data pointers
- mark_tied_weights_as_initialized(): Return early if attribute missing

This allows remote models to load successfully while maintaining tied weight detection.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants