Skip to content

[loading] Clean way to add/remove full parts in checkpoint names#45448

Merged
Cyrilvallez merged 37 commits intomainfrom
fix-clips
Apr 20, 2026
Merged

[loading] Clean way to add/remove full parts in checkpoint names#45448
Cyrilvallez merged 37 commits intomainfrom
fix-clips

Conversation

@Cyrilvallez
Copy link
Copy Markdown
Member

@Cyrilvallez Cyrilvallez commented Apr 15, 2026

What does this PR do?

As per the title.

The issue

The problem is that transforms that want to remove a full part of a model name (such as a prefix, e.g. the model. start) are non bijective in general, i.e. we completely lose the information when they are dropped. So adding them back later when saving is impossible without runtime information about the checkpoint that was used, i.e. we need to know if we had the prefix before or not, we cannot infer it based on anything.

Proposed solution

This PR add a simple mechanism for such things, i.e. WeightTransform have a simple flag to describe if they were used to rename a weight or not. If it is the case, we keep them when we save the Transform on the model (this was already performed before). If not, we drop them, so that they are not used when resaving.
It also introduces the PrefixChange class (a simple class inherited from WeightRenaming) to simplify full addition/removal of full parts, because otherwise the regexes to use in such cases are hard to read/write.

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@Cyrilvallez Cyrilvallez changed the title Fix clips [loading] Clean way to add/remove full parts in checkpoint names Apr 16, 2026
Comment on lines 580 to -582

@dataclass(slots=True)
Copy link
Copy Markdown
Member Author

@Cyrilvallez Cyrilvallez Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They were dataclasses but it did not make any sense, so removed it (but kept the slots, the only feature we were really using from dataclass - makes it much easier to inherit etc

@albertvillanova
Copy link
Copy Markdown
Member

albertvillanova commented Apr 17, 2026

Thanks for addressing this issue, @Cyrilvallez.

I have tested trl using your PR, and unfortunately it seems there is the same renaming issue I mentioned in the precedent PR: #45361 (comment)

However, it looks like there is still a change in parameter naming: the vision_model nesting was eliminated (as I commented in the trl issue: huggingface/trl#5497 (comment)).

For example:

  • model.vision_tower.vision_model.encoder.layers.1.self_attn.k_proj.weight before
  • model.vision_tower.encoder.layers.1.self_attn.k_proj.weight now

Is this renaming intended? This is the reason why our tests are still failing, as they rely on the previous name nesting structure.

@Cyrilvallez
Copy link
Copy Markdown
Member Author

@albertvillanova The renaming is fully intended in order to load the weights. Since the module was removed, the weights need to be renamed to match the actual module graph. However, if you resave the weights afterwards, the name should be reverted back to the same as initially. Could you please point me towards the exact code that triggers the issue? From my own tests, everything was good

@Cyrilvallez
Copy link
Copy Markdown
Member Author

If you compare initial weights to model.state_dict() or something, it's expecting that it does not match, as the model actually sees other names

@albertvillanova
Copy link
Copy Markdown
Member

albertvillanova commented Apr 17, 2026

@Cyrilvallez this is the test code: https://github.com/huggingface/trl/blob/a09320e384461bc2a1bf301578bdc2c71fdc91b5/tests/test_dpo_trainer.py#L1085-L1102

previous_trainable_params = {n: param.clone() for n, param in trainer.model.named_parameters()}

trainer.train()

for n, param in previous_trainable_params.items():
    if model_id == "trl-internal-testing/tiny-LlavaForConditionalGeneration" and "model.vision_tower.vision_model.encoder.layers.1" in n:
        continue

@Cyrilvallez
Copy link
Copy Markdown
Member Author

Ha I see, it's what I thought. You indeed need to change the hardcoded name on your side in this case - the weight name in the model itself has changed and cannot be changed back since the architecture was modified. All parameters names inspected on-the-fly will have their vision_model part removed as it does not exist anymore in the module graph. Same as if you were to inspect the modules themselves, the previous CLIPVisioonTransformer was fully removed.
Only when saving the model we make sure it's added back, but when inspecting the state_dict directly, it won't be present anymore!

@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: altclip

@albertvillanova
Copy link
Copy Markdown
Member

@Cyrilvallez, thanks a lot for your clear explanation. This confirms my initial guess in relation with the downstream trl issue:

huggingface/trl#5497 (comment)

you basically removed CLIPVisionTransformer, and integrated it into CLIPVisionModel, removing one nesting level: no vision_model now.

huggingface/trl#5497 (comment)

CLIPVisionModel:

  • Before: self.vision_model = CLIPVisionTransformer(config)
    • where CLIPVisionTransformer: self.encoder = CLIPEncoder(config)
  • Now: self.encoder = CLIPEncoder(config)

In relation with your comment:

You indeed need to change the hardcoded name on your side in this cas

therefore, this confirms as well my original fix for trl:

Thanks, @Cyrilvallez! 🤗

@Cyrilvallez
Copy link
Copy Markdown
Member Author

Got offline approval from @vasqu and @zucchini-nlp! Merging!

@Cyrilvallez Cyrilvallez merged commit ad0c0f9 into main Apr 20, 2026
29 checks passed
@Cyrilvallez Cyrilvallez deleted the fix-clips branch April 20, 2026 07:17
BenjaminBossan added a commit to BenjaminBossan/peft that referenced this pull request Apr 20, 2026
The model structure of Clip and similar models sharing the archicture
has changed in Transformers. See:

- huggingface/transformers#45361
- huggingface/transformers#45448

The test was updated to reflect the change.
BenjaminBossan added a commit to BenjaminBossan/peft that referenced this pull request Apr 20, 2026
The model structure of Clip and similar models sharing the archicture
has changed in Transformers. See:

- huggingface/transformers#45361
- huggingface/transformers#45448

The test was updated to reflect the change.
BenjaminBossan added a commit to huggingface/peft that referenced this pull request Apr 20, 2026
The model structure of Clip and similar models sharing the archicture
has changed in Transformers. See:

- huggingface/transformers#45361
- huggingface/transformers#45448

The test was updated to reflect the change.
@BenjaminBossan
Copy link
Copy Markdown
Member

@Cyrilvallez Unfortunately, this PR breaks the PEFT weight conversion code, e.g.:

RUN_SLOW=1 pytest tests/peft_integration/test_peft_integration.py -k test_mixtral_lora_conversion

The error is

                    # Instantiate a new object that correctly post process patterns if needed
>                   new_conversion = orig_conversion.__class__(
                        source_patterns=new_source_patterns,
                        target_patterns=new_target_patterns,
                        distributed_operation=orig_conversion.distributed_operation,
                        quantization_operation=orig_conversion.quantization_operation,
                        operations=peft_weight_operations,
                    )
E                   TypeError: WeightConverter.__init__() got an unexpected keyword argument 'distributed_operation'

src/transformers/integrations/peft.py:297: TypeError

in the build_peft_weight_mapping function.

lvliang-intel pushed a commit to lvliang-intel/transformers that referenced this pull request Apr 21, 2026
…gingface#45448)

* try

* fix

* oupsi typo

* oupsi typo

* get rid of dataclasses

* try

* oupsi

* revert from before

* fix

* add parenthesis

* fix

* fix

* fixes

* need to revert the order for saving

* comment

* a bit more general

* simplify

* start adding tests

* typo

* fix dot

* fix

* more tests

* add harder tests

* fix

* improve tests

* comment

* doc

* skip in tests

* fix cohere_asr mapping

* add other needed models to mapping

* add text mappings

* add back

* better comment

* simplify

* remove overriden test

* deduplicate doc
artem-spector pushed a commit to artem-spector/transformers that referenced this pull request Apr 21, 2026
…gingface#45448)

* try

* fix

* oupsi typo

* oupsi typo

* get rid of dataclasses

* try

* oupsi

* revert from before

* fix

* add parenthesis

* fix

* fix

* fixes

* need to revert the order for saving

* comment

* a bit more general

* simplify

* start adding tests

* typo

* fix dot

* fix

* more tests

* add harder tests

* fix

* improve tests

* comment

* doc

* skip in tests

* fix cohere_asr mapping

* add other needed models to mapping

* add text mappings

* add back

* better comment

* simplify

* remove overriden test

* deduplicate doc
@BenjaminBossan
Copy link
Copy Markdown
Member

@Cyrilvallez I retested with the latest main branch and now the error is gone. I assume it's your doing, so thanks!

@Cyrilvallez
Copy link
Copy Markdown
Member Author

Hey @BenjaminBossan! Are you actually sure it's fixed? Since it's not a dataclass anymore, I think the __init__ is not aligned anymore 🥲...

@BenjaminBossan
Copy link
Copy Markdown
Member

@Cyrilvallez Damn, you're right, it's not fixed. I must have missed it in the flood of other errors :D. Is there an actual fix on the way?

@Cyrilvallez
Copy link
Copy Markdown
Member Author

Opening a PR rn

BenjaminBossan added a commit to huggingface/peft that referenced this pull request Apr 29, 2026
After a change in
huggingface/transformers#45448, weight
conversion tests started failing. Transformers provided a fix in
huggingface/transformers#45622 but it needs to
be ported to PEFT too.

This PR, together with the Transformers fix, resolves the issue.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants