Correctly create tied key mapping in post_init, and dynamic tie weight#42270
Correctly create tied key mapping in post_init, and dynamic tie weight#42270Cyrilvallez merged 18 commits intomainfrom
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
ArthurZucker
left a comment
There was a problem hiding this comment.
missing a bit of reprensentative doc! Let's take a t5 as example ? Or rtdetr? to ahve a complexe list
| for prefix, submodule in self.named_modules(): | ||
| if isinstance(submodule, PreTrainedModel): | ||
| # Will dynamically check the config if it has changed | ||
| submodel_tied_weights = submodule.get_expanded_tied_weights_keys(all_submodels=False) |
There was a problem hiding this comment.
don't know if we really have to go the inheritance path here?
There was a problem hiding this comment.
given that we do named_parameters afterwards
There was a problem hiding this comment.
Yes, in order to check the proper subconfig... No better way unfortunately as sometimes we cannot get the subconfig in a proper way
| source_name = "^" + source_name | ||
| target_name = "^" + target_name | ||
| # In this case, the keys stored in `all_tied_weights_keys` are already correct | ||
| if not recompute_mapping: |
There was a problem hiding this comment.
to update with setter and getter for tie_words_embedding no?
There was a problem hiding this comment.
No, was already checked before!
|
[For maintainers] Suggested jobs to run (before merge) run-slow: esm, hubert, idefics, openai, sew, sew_d, unispeech, unispeech_sat, wav2vec2, wavlm |
ArthurZucker
left a comment
There was a problem hiding this comment.
Thanks for itiretaing ! i like that its explicit now!
|
@Cyrilvallez @ArthurZucker As a result, when training Qwen/Qwen3-0.6B, the gradients update completely different parameters before and after this PR. I believe the original behavior is more consistent with the model’s intended design. |
huggingface#42270) * add dynamic * improve * doc * true dynamic * everywhere * improve * fix * more * small fix * small fix * fix duplicates * fix * doc * fix * improve doc * comment * more doc * style
Fixes huggingface#43883 After huggingface#42270, all_tied_weights_keys is initialized in post_init(), but remote models loaded with trust_remote_code=True don't always call post_init() properly, causing AttributeError when loading models like Molmo. This fix adds defensive checks in two methods: - _adjust_tied_keys_with_tied_pointers(): Initialize empty dict if missing, then detect tied weights via data pointers - mark_tied_weights_as_initialized(): Return early if attribute missing This allows remote models to load successfully while maintaining tied weight detection.
As we rely more and more in
self.all_tied_weight_keyseverywhere (i.e. the list of tied keys obtained duringpost_init) for multiple manipulations (device_map computation, cuda warmup, post-processing offrom_pretrained...), it becomes very important that the (few) models containing regex patterns for their_tied_weights_keysmapping have the patterns expanded to fit inall_tied_weight_keysas well, instead of containing simple patterns that are skipped in different ways for all downstream application.This PR fixes that, by expanding correctly at
post_inittime, so the mapping are correct params everywhere.Also allows for recomputing this mapping in
tie_weightsdynamically, so that it is correct if callingtie_weightsafter having modified the config