[loading] Clean way to add/remove full parts in checkpoint names#45448
[loading] Clean way to add/remove full parts in checkpoint names#45448Cyrilvallez merged 37 commits intomainfrom
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
|
||
| @dataclass(slots=True) |
There was a problem hiding this comment.
They were dataclasses but it did not make any sense, so removed it (but kept the slots, the only feature we were really using from dataclass - makes it much easier to inherit etc
|
Thanks for addressing this issue, @Cyrilvallez. I have tested
For example:
Is this renaming intended? This is the reason why our tests are still failing, as they rely on the previous name nesting structure. |
|
@albertvillanova The renaming is fully intended in order to load the weights. Since the module was removed, the weights need to be renamed to match the actual module graph. However, if you resave the weights afterwards, the name should be reverted back to the same as initially. Could you please point me towards the exact code that triggers the issue? From my own tests, everything was good |
|
If you compare initial weights to |
|
@Cyrilvallez this is the test code: https://github.com/huggingface/trl/blob/a09320e384461bc2a1bf301578bdc2c71fdc91b5/tests/test_dpo_trainer.py#L1085-L1102 previous_trainable_params = {n: param.clone() for n, param in trainer.model.named_parameters()}
trainer.train()
for n, param in previous_trainable_params.items():
if model_id == "trl-internal-testing/tiny-LlavaForConditionalGeneration" and "model.vision_tower.vision_model.encoder.layers.1" in n:
continue |
|
Ha I see, it's what I thought. You indeed need to change the hardcoded name on your side in this case - the weight name in the model itself has changed and cannot be changed back since the architecture was modified. All parameters names inspected on-the-fly will have their |
|
[For maintainers] Suggested jobs to run (before merge) run-slow: altclip |
|
@Cyrilvallez, thanks a lot for your clear explanation. This confirms my initial guess in relation with the downstream huggingface/trl#5497 (comment)
huggingface/trl#5497 (comment)
In relation with your comment:
therefore, this confirms as well my original fix for
Thanks, @Cyrilvallez! 🤗 |
|
Got offline approval from @vasqu and @zucchini-nlp! Merging! |
The model structure of Clip and similar models sharing the archicture has changed in Transformers. See: - huggingface/transformers#45361 - huggingface/transformers#45448 The test was updated to reflect the change.
The model structure of Clip and similar models sharing the archicture has changed in Transformers. See: - huggingface/transformers#45361 - huggingface/transformers#45448 The test was updated to reflect the change.
The model structure of Clip and similar models sharing the archicture has changed in Transformers. See: - huggingface/transformers#45361 - huggingface/transformers#45448 The test was updated to reflect the change.
|
@Cyrilvallez Unfortunately, this PR breaks the PEFT weight conversion code, e.g.:
The error is in the |
…gingface#45448) * try * fix * oupsi typo * oupsi typo * get rid of dataclasses * try * oupsi * revert from before * fix * add parenthesis * fix * fix * fixes * need to revert the order for saving * comment * a bit more general * simplify * start adding tests * typo * fix dot * fix * more tests * add harder tests * fix * improve tests * comment * doc * skip in tests * fix cohere_asr mapping * add other needed models to mapping * add text mappings * add back * better comment * simplify * remove overriden test * deduplicate doc
…gingface#45448) * try * fix * oupsi typo * oupsi typo * get rid of dataclasses * try * oupsi * revert from before * fix * add parenthesis * fix * fix * fixes * need to revert the order for saving * comment * a bit more general * simplify * start adding tests * typo * fix dot * fix * more tests * add harder tests * fix * improve tests * comment * doc * skip in tests * fix cohere_asr mapping * add other needed models to mapping * add text mappings * add back * better comment * simplify * remove overriden test * deduplicate doc
|
@Cyrilvallez I retested with the latest main branch and now the error is gone. I assume it's your doing, so thanks! |
|
Hey @BenjaminBossan! Are you actually sure it's fixed? Since it's not a dataclass anymore, I think the |
|
@Cyrilvallez Damn, you're right, it's not fixed. I must have missed it in the flood of other errors :D. Is there an actual fix on the way? |
|
Opening a PR rn |
After a change in huggingface/transformers#45448, weight conversion tests started failing. Transformers provided a fix in huggingface/transformers#45622 but it needs to be ported to PEFT too. This PR, together with the Transformers fix, resolves the issue.
What does this PR do?
As per the title.
The issue
The problem is that transforms that want to remove a full part of a model name (such as a prefix, e.g. the
model.start) are non bijective in general, i.e. we completely lose the information when they are dropped. So adding them back later when saving is impossible without runtime information about the checkpoint that was used, i.e. we need to know if we had the prefix before or not, we cannot infer it based on anything.Proposed solution
This PR add a simple mechanism for such things, i.e. WeightTransform have a simple flag to describe if they were used to rename a weight or not. If it is the case, we keep them when we save the Transform on the model (this was already performed before). If not, we drop them, so that they are not used when resaving.
It also introduces the
PrefixChangeclass (a simple class inherited fromWeightRenaming) to simplify full addition/removal of full parts, because otherwise the regexes to use in such cases are hard to read/write.