Fix vlm weight mappings#45358
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
| WeightRenaming(source_patterns=r"^vision_tower", target_patterns="model.vision_tower"), | ||
| WeightRenaming(source_patterns=r"^multi_modal_projector", target_patterns="model.multi_modal_projector"), |
There was a problem hiding this comment.
oh shit, another base-model-prefix. We really need a proper way to add and delete the prefix when saving the model
These were supposed to be grabbed from each model's base-model-prefix which apparently worked only when
loading 😓
There was a problem hiding this comment.
and a proper test would be nice for saving, as we assume that test_reverse_mapping checks that serialized keys
|
[For maintainers] Suggested jobs to run (before merge) run-slow: aya_vision, cohere_asr, colpali, colqwen2, emu3, fuyu, gemma3, got_ocr2, gpt_oss, internvl, llava, llava_next, llava_next_video, llava_onevision, mistral3, mllama |
|
Also I want to flag that this tone isn't okay for me. Please keep feedback constructive and focused on the code, not the person |
|
Oh very sorry about the tone, did not mean to be mean or anything, just wanted to flag that those parts are absolutely critical and not really tested (because meaningful test would require knowledge of the serialized hub weights, which is not really feasible with the small "dummy" models we use for tests). So we need to be extremely careful about those |
|
Tested as much as I can on real weights format, and saving/loading give the same serialization format! Merging! |
* fix * style * comment * comment * remove gemma3n - should never have been there * fix much more....... * skip test for base models * revert unwanted changes * fix
What does this PR do?
Fix #45357 finally. This was not catched in the previous fix, as the model can be reloaded correctly by
from_pretrained, but keys are still wrongly serialized!After deeper look, I noticed @zucchini-nlp did not correctly copy the mappings in #44627... It is EXTREMELY IMPORTANT and can very easily silently break loading and/or saving @zucchini-nlp - we cannot touch the mappings without being 100% sure of the change. You missed a lot before
In this case, most would load correctly, but would not resave the same format
For ref, I'm using this small snippet to check formats: