Skip to content

Fix vlm weight mappings#45358

Merged
Cyrilvallez merged 9 commits intomainfrom
fix-conversion
Apr 10, 2026
Merged

Fix vlm weight mappings#45358
Cyrilvallez merged 9 commits intomainfrom
fix-conversion

Conversation

@Cyrilvallez
Copy link
Copy Markdown
Member

@Cyrilvallez Cyrilvallez commented Apr 10, 2026

What does this PR do?

Fix #45357 finally. This was not catched in the previous fix, as the model can be reloaded correctly by from_pretrained, but keys are still wrongly serialized!

After deeper look, I noticed @zucchini-nlp did not correctly copy the mappings in #44627... It is EXTREMELY IMPORTANT and can very easily silently break loading and/or saving @zucchini-nlp - we cannot touch the mappings without being 100% sure of the change. You missed a lot before
In this case, most would load correctly, but would not resave the same format

For ref, I'm using this small snippet to check formats:

import transformers
from transformers import LlavaForConditionalGeneration, LlavaNextForConditionalGeneration
from safetensors.torch import load_file
from transformers.utils.hub import cached_file, cached_files
import json

# model_id = "llava-hf/llava-1.5-7b-hf"
model_id = "llava-hf/llava-v1.6-mistral-7b-hf"
# model_id = "adept/fuyu-8b"
model_class = transformers.LlavaNextForConditionalGeneration

target_folder = "/raid/cyril/test_model"

with open(cached_file(model_id, "model.safetensors.index.json")) as f:
    index = json.load(f)
model_files = set(index["weight_map"].values())
model_files = cached_files(model_id, model_files)

original_state_dict = {}
for file in model_files:
    original_state_dict.update(load_file(file))

model = model_class.from_pretrained(model_id)
model.save_pretrained(target_folder)

saved_weights = load_file(f"{target_folder}/model.safetensors")

not_in_saved = []
for k, v in original_state_dict.items():
    if k not in saved_weights:
        not_in_saved.append(k)
    else:
        assert (v == saved_weights[k]).all()

not_in_original = []
for k, v in saved_weights.items():
    if k not in original_state_dict:
        not_in_original.append(k)
    else:
        assert (v == original_state_dict[k]).all()

print(f"The following are in original but not in saved: {not_in_saved}")
print(f"The following are saved but not in original: {not_in_original}")

model = model_class.from_pretrained(target_folder)

@Cyrilvallez Cyrilvallez added the for patch Tag issues / labels that should be included in the next patch label Apr 10, 2026
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@Cyrilvallez Cyrilvallez changed the title Fix qwen3_5 key renaming Fix vlm weight mappings Apr 10, 2026
Comment on lines +89 to +90
WeightRenaming(source_patterns=r"^vision_tower", target_patterns="model.vision_tower"),
WeightRenaming(source_patterns=r"^multi_modal_projector", target_patterns="model.multi_modal_projector"),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh shit, another base-model-prefix. We really need a proper way to add and delete the prefix when saving the model

These were supposed to be grabbed from each model's base-model-prefix which apparently worked only when
loading 😓

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and a proper test would be nice for saving, as we assume that test_reverse_mapping checks that serialized keys

@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: aya_vision, cohere_asr, colpali, colqwen2, emu3, fuyu, gemma3, got_ocr2, gpt_oss, internvl, llava, llava_next, llava_next_video, llava_onevision, mistral3, mllama

@zucchini-nlp
Copy link
Copy Markdown
Member

Also I want to flag that this tone isn't okay for me. Please keep feedback constructive and focused on the code, not the person

@Cyrilvallez
Copy link
Copy Markdown
Member Author

Oh very sorry about the tone, did not mean to be mean or anything, just wanted to flag that those parts are absolutely critical and not really tested (because meaningful test would require knowledge of the serialized hub weights, which is not really feasible with the small "dummy" models we use for tests). So we need to be extremely careful about those

@Cyrilvallez
Copy link
Copy Markdown
Member Author

Tested as much as I can on real weights format, and saving/loading give the same serialization format! Merging!

@Cyrilvallez Cyrilvallez merged commit f3a68c4 into main Apr 10, 2026
29 checks passed
@Cyrilvallez Cyrilvallez deleted the fix-conversion branch April 10, 2026 15:41
sirzechs66 pushed a commit to sirzechs66/transformers that referenced this pull request Apr 18, 2026
* fix

* style

* comment

* comment

* remove gemma3n - should never have been there

* fix much more.......

* skip test for base models

* revert unwanted changes

* fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

for patch Tag issues / labels that should be included in the next patch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Regression] Qwen3.5 save_pretrained still saves incorrect visual encoder keys in 5.5.3

3 participants