[loading] Clean way to add/remove full parts in checkpoint names by Cyrilvallez · Pull Request #45448 · huggingface/transformers

Cyrilvallez · 2026-04-15T08:06:53Z

What does this PR do?

As per the title.

The issue

The problem is that transforms that want to remove a full part of a model name (such as a prefix, e.g. the model. start) are non bijective in general, i.e. we completely lose the information when they are dropped. So adding them back later when saving is impossible without runtime information about the checkpoint that was used, i.e. we need to know if we had the prefix before or not, we cannot infer it based on anything.

Proposed solution

This PR add a simple mechanism for such things, i.e. WeightTransform have a simple flag to describe if they were used to rename a weight or not. If it is the case, we keep them when we save the Transform on the model (this was already performed before). If not, we drop them, so that they are not used when resaving.
It also introduces the PrefixChange class (a simple class inherited from WeightRenaming) to simplify full addition/removal of full parts, because otherwise the regexes to use in such cases are hard to read/write.

HuggingFaceDocBuilderDev · 2026-04-15T08:26:35Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Cyrilvallez · 2026-04-16T01:34:12Z


-@dataclass(slots=True)


They were dataclasses but it did not make any sense, so removed it (but kept the slots, the only feature we were really using from dataclass - makes it much easier to inherit etc

albertvillanova · 2026-04-17T09:40:09Z

Thanks for addressing this issue, @Cyrilvallez.

I have tested trl using your PR, and unfortunately it seems there is the same renaming issue I mentioned in the precedent PR: #45361 (comment)

However, it looks like there is still a change in parameter naming: the vision_model nesting was eliminated (as I commented in the trl issue: huggingface/trl#5497 (comment)).

For example:

model.vision_tower.vision_model.encoder.layers.1.self_attn.k_proj.weight before
model.vision_tower.encoder.layers.1.self_attn.k_proj.weight now

Is this renaming intended? This is the reason why our tests are still failing, as they rely on the previous name nesting structure.

Cyrilvallez · 2026-04-17T10:40:16Z

@albertvillanova The renaming is fully intended in order to load the weights. Since the module was removed, the weights need to be renamed to match the actual module graph. However, if you resave the weights afterwards, the name should be reverted back to the same as initially. Could you please point me towards the exact code that triggers the issue? From my own tests, everything was good

Cyrilvallez · 2026-04-17T10:41:27Z

If you compare initial weights to model.state_dict() or something, it's expecting that it does not match, as the model actually sees other names

albertvillanova · 2026-04-17T13:43:32Z

@Cyrilvallez this is the test code: https://github.com/huggingface/trl/blob/a09320e384461bc2a1bf301578bdc2c71fdc91b5/tests/test_dpo_trainer.py#L1085-L1102

previous_trainable_params = {n: param.clone() for n, param in trainer.model.named_parameters()}

trainer.train()

for n, param in previous_trainable_params.items():
    if model_id == "trl-internal-testing/tiny-LlavaForConditionalGeneration" and "model.vision_tower.vision_model.encoder.layers.1" in n:
        continue

Cyrilvallez · 2026-04-18T12:42:21Z

Ha I see, it's what I thought. You indeed need to change the hardcoded name on your side in this case - the weight name in the model itself has changed and cannot be changed back since the architecture was modified. All parameters names inspected on-the-fly will have their vision_model part removed as it does not exist anymore in the module graph. Same as if you were to inspect the modules themselves, the previous CLIPVisioonTransformer was fully removed.
Only when saving the model we make sure it's added back, but when inspecting the state_dict directly, it won't be present anymore!

github-actions · 2026-04-20T03:20:47Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: altclip

albertvillanova · 2026-04-20T05:18:26Z

@Cyrilvallez, thanks a lot for your clear explanation. This confirms my initial guess in relation with the downstream trl issue:

huggingface/trl#5497 (comment)

you basically removed CLIPVisionTransformer, and integrated it into CLIPVisionModel, removing one nesting level: no vision_model now.

huggingface/trl#5497 (comment)

CLIPVisionModel:

Before: self.vision_model = CLIPVisionTransformer(config)

where CLIPVisionTransformer: self.encoder = CLIPEncoder(config)

Now: self.encoder = CLIPEncoder(config)

In relation with your comment:

You indeed need to change the hardcoded name on your side in this cas

therefore, this confirms as well my original fix for trl:

Fix CI with dev dependencies for Llava models trl#5499
- I'm reopening it

Thanks, @Cyrilvallez! 🤗

Cyrilvallez · 2026-04-20T07:17:17Z

Got offline approval from @vasqu and @zucchini-nlp! Merging!

The model structure of Clip and similar models sharing the archicture has changed in Transformers. See: - huggingface/transformers#45361 - huggingface/transformers#45448 The test was updated to reflect the change.

BenjaminBossan · 2026-04-20T13:41:52Z

@Cyrilvallez Unfortunately, this PR breaks the PEFT weight conversion code, e.g.:

RUN_SLOW=1 pytest tests/peft_integration/test_peft_integration.py -k test_mixtral_lora_conversion

The error is

                    # Instantiate a new object that correctly post process patterns if needed
>                   new_conversion = orig_conversion.__class__(
                        source_patterns=new_source_patterns,
                        target_patterns=new_target_patterns,
                        distributed_operation=orig_conversion.distributed_operation,
                        quantization_operation=orig_conversion.quantization_operation,
                        operations=peft_weight_operations,
                    )
E                   TypeError: WeightConverter.__init__() got an unexpected keyword argument 'distributed_operation'

src/transformers/integrations/peft.py:297: TypeError

in the build_peft_weight_mapping function.

…gingface#45448) * try * fix * oupsi typo * oupsi typo * get rid of dataclasses * try * oupsi * revert from before * fix * add parenthesis * fix * fix * fixes * need to revert the order for saving * comment * a bit more general * simplify * start adding tests * typo * fix dot * fix * more tests * add harder tests * fix * improve tests * comment * doc * skip in tests * fix cohere_asr mapping * add other needed models to mapping * add text mappings * add back * better comment * simplify * remove overriden test * deduplicate doc

BenjaminBossan · 2026-04-22T12:53:31Z

@Cyrilvallez I retested with the latest main branch and now the error is gone. I assume it's your doing, so thanks!

Cyrilvallez · 2026-04-23T05:56:25Z

Hey @BenjaminBossan! Are you actually sure it's fixed? Since it's not a dataclass anymore, I think the __init__ is not aligned anymore 🥲...

BenjaminBossan · 2026-04-23T11:15:23Z

@Cyrilvallez Damn, you're right, it's not fixed. I must have missed it in the flood of other errors :D. Is there an actual fix on the way?

Cyrilvallez · 2026-04-24T07:47:19Z

Opening a PR rn

After a change in huggingface/transformers#45448, weight conversion tests started failing. Transformers provided a fix in huggingface/transformers#45622 but it needs to be ported to PEFT too. This PR, together with the Transformers fix, resolves the issue.

Cyrilvallez added 10 commits April 15, 2026 12:45

try

6069e63

fix

b9c885f

oupsi typo

9003fd8

oupsi typo

2572a25

get rid of dataclasses

d0f7fb2

try

11bc494

oupsi

8de7e0f

revert from before

7f38c23

fix

b98194d

add parenthesis

4c05d6e

Cyrilvallez added 10 commits April 15, 2026 17:33

fix

b2f8cc8

fix

80d8386

fixes

d3cc313

need to revert the order for saving

e925ce8

comment

e792532

a bit more general

c48218d

simplify

01cda19

start adding tests

e38bad1

typo

2014ee1

fix dot

f8acd0a

vasqu mentioned this pull request Apr 15, 2026

Improve nested base_model_prefix handling in weight conversion and loading #45421

Open

Cyrilvallez added 3 commits April 16, 2026 09:09

fix

b0f5c26

more tests

e3bc9e8

add harder tests

2cb1633

Cyrilvallez changed the title ~~Fix clips~~ [loading] Clean way to add/remove full parts in checkpoint names Apr 16, 2026

Cyrilvallez added 3 commits April 16, 2026 10:25

fix

d02310f

improve tests

fa3bc47

comment

3ba24ba

Cyrilvallez commented Apr 16, 2026

View reviewed changes

Cyrilvallez and others added 2 commits April 17, 2026 17:06

remove overriden test

a19ad84

Merge branch 'main' into fix-clips

f168035

deduplicate doc

94eaeb0

albertvillanova mentioned this pull request Apr 20, 2026

Fix CI with dev dependencies for Llava models huggingface/trl#5499

Merged

Cyrilvallez merged commit ad0c0f9 into main Apr 20, 2026
29 checks passed

Cyrilvallez deleted the fix-clips branch April 20, 2026 07:17

BenjaminBossan mentioned this pull request Apr 20, 2026

FIX Clip module structure since transformers > 5.5 huggingface/peft#3179

Merged

JustinTong0323 mentioned this pull request Apr 23, 2026

Upgrade transformers from 5.5.4 to 5.6.0 sgl-project/sglang#23525

Merged

6 tasks

Cyrilvallez mentioned this pull request Apr 24, 2026

Fix peft constructors #45622

Merged

BenjaminBossan mentioned this pull request Apr 24, 2026

FIX Transformers weight conversion regression huggingface/peft#3197

Merged

evalstate mentioned this pull request Apr 28, 2026

Cumulative defect fixes from recent Transformers PRs evalstate/transformers#41

Open


		@dataclass(slots=True)

Conversation

Cyrilvallez commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

The issue

Proposed solution

Uh oh!

HuggingFaceDocBuilderDev commented Apr 15, 2026

Uh oh!

Cyrilvallez Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

albertvillanova commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Cyrilvallez commented Apr 17, 2026

Uh oh!

Cyrilvallez commented Apr 17, 2026

Uh oh!

albertvillanova commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Cyrilvallez commented Apr 18, 2026

Uh oh!

github-actions Bot commented Apr 20, 2026

Uh oh!

albertvillanova commented Apr 20, 2026

Uh oh!

Cyrilvallez commented Apr 20, 2026

Uh oh!

Uh oh!

BenjaminBossan commented Apr 20, 2026

Uh oh!

BenjaminBossan commented Apr 22, 2026

Uh oh!

Cyrilvallez commented Apr 23, 2026

Uh oh!

BenjaminBossan commented Apr 23, 2026

Uh oh!

Cyrilvallez commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Cyrilvallez commented Apr 15, 2026 •

edited

Loading

Cyrilvallez Apr 16, 2026 •

edited

Loading

albertvillanova commented Apr 17, 2026 •

edited

Loading

albertvillanova commented Apr 17, 2026 •

edited

Loading