bugfix: propage weight key_mapping to peft to fix 3.52 VLM renaming #38627
bugfix: propage weight key_mapping to peft to fix 3.52 VLM renaming #38627Cyrilvallez merged 4 commits intohuggingface:mainfrom
Conversation
zucchini-nlp
left a comment
There was a problem hiding this comment.
Personally it looks to be doing the same key rename as in loading a base transformers, though I am not very familiar with how PEFT works. cc @BenjaminBossan for that
For saving, yeah, if that could be saved in new format and still loaded back correctly, that is the preferred way. However currently transformers saves back the base model in prev-style state dict, I am planning to gradually remove it
|
I mean this is not even PEFT, this code works without the peft library. Very intuitively, it just changes the names of the modules the adapters attach to, matching the code you wrote to rename the params in the base model ! |
BenjaminBossan
left a comment
There was a problem hiding this comment.
Just to ensure that my understanding is correct: The structure of some VLM architectures has changed. Therefore, we need to remap in order to allow old checkpoints to still be loaded. This requires a mapping dict to be applied, but this dict was not applied when using model.load_adapter. This PR fixes that.
Generally, this LGTM. Ideally, I'd like to also see a unit test to ensure this works and won't break in the future. This would require an "old style" checkpoint, so maybe not that easy to implement, perhaps @zucchini-nlp has some tips.
A question that came up: Would this change not mean that we have to apply a similar mapping in PEFT too (and similarly all external packages that create checkpoints based on top of transformers VLMs)?
|
@BenjaminBossan yes, exactly correct ! I made the requested changes. I'll let @zucchini-nlp implement the test, I'm on vacation until mid june and as you say, not sure how to best process. As for your interrogation, this is exactly my case. I maintain the The code I wrote here should enable all PEFT checkpoints that are linked to models in the VLMs list (explicit mapping dictionary and remapping is specified) to function. I'm not sure you need to do anything on the PEFT side in your case. BTW @zucchini-nlp, the verifications that are made for the video preprocessor resolution also don't consider the case of adapters that don't have a config.json in the repo itself (but that's a different issue altogether). I felt fixing the loading was the most urgent in all cases because thousands of models probably have become obsolete. |
|
Reverting back to the way it was |
Hmm, but PEFT itself does not go through transformers
Since PEFT will be applied on top of the new architecture, the PEFT checkpoint itself will also assume the new architecture. IIUC, this would mean that if a user saves a PEFT checkpoint with the new architecture, and another user with an older transformers version from before #37033 tries to load it, it will fail (no forward compatibility). Is that right @zucchini-nlp?
Ah yes sorry, the |
|
IDK PEFT well so you're surely right. The mapping regexes make it so both ways of saving should enable loading it back up correctly (verifying the beginning of lines etc), so should be okay on that end. I'm doing it with colqwen2-v1.0 personally, but I'm guessing some generative qwen2-vl fine-tunes should be affected, lemme check ! |
|
Looks like that breaks more things than we expected. My expectation was that almost all external libraries that use transformers model classes would load ckpt with
This one therefore should not be a problem, if we ignore the adapters for now. I believe the solution in this PR which re-maps keys to their new style would make it work for PEFT in the same way as in bare transformers. Regarding the PEFT library, I am neither sure if you need any changes since I am not an expert. We can look at it together and check |
| model.hf_quantizer = hf_quantizer | ||
|
|
||
| if _adapter_model_path is not None: | ||
| adapter_kwargs["key_mapping"] = key_mapping |
There was a problem hiding this comment.
this will use any key_mapping provided by users, which we don't probably want yet. Let's check if it is a VLM and apply only for those cases
There was a problem hiding this comment.
Please, let's not hardcode anymore things :/ User specified key mappings should be propagated as well I believe.
It's the only way to propagate the mapping for all models that inherit from VLM classes that are not specified in your hardcoded list. This is not a rare setting, and currently, passing a key_mapping to the from_pretrained class is the only viable workaround to the bump.
Typically, in ColQwen2, I now have to do this:
class ColQwen2(Qwen2VLForConditionalGeneration):
@classmethod
def from_pretrained(cls, *args, **kwargs):
key_mapping = kwargs.pop("key_mapping", None)
if key_mapping is None:
key_mapping = super()._checkpoint_conversion_mapping
return super().from_pretrained(*args, **kwargs, key_mapping=key_mapping)There was a problem hiding this comment.
I agree that hardcoding is suboptimal and that was done for loading/saving to prevent users from using self._checkpoint_conversion_mapping as a hacky workaround in their models. We don't aim to maintain self._checkpoint_conversion_mapping for all models and want to restrict for VLMs only for BC
About applying key_mapping all models, I am pinging core maintainers @Cyrilvallez. Not sure if that was an intended behavior when adding key_mapping` or not, Let's make sure it aligns with the other maintainers' plans :)
There was a problem hiding this comment.
Hey! Indeed, key_mapping is public and supposed to be used at the user's discretion - however it's a niche arg, and we don't expect many people to use it. But it should still be propagated correctly in all cases as it's being done right now
IMO, vlms should have merged the potential mapping passed by the user to their own, instead of overriding it, but once again as we don't expect people to use the arg except in really particular situations, it's not a big issue 👌
|
@BenjaminBossan Here's a minimal example with a model with a noin-moddified architecture (and a peft adapter) import torch
from PIL import Image
from transformers import AutoProcessor, Qwen2VLForConditionalGeneration
# Load processor and model
processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-2B-Instruct")
model = Qwen2VLForConditionalGeneration.from_pretrained(
"lightonai/MonoQwen2-VL-v0.1",
device_map="cuda",
# attn_implementation="flash_attention_2",
# torch_dtype=torch.bfloat16,
)
|
|
Thanks for the pointer. I ran my own tests to check all directions, using transformers import os
from transformers import Qwen2VLForConditionalGeneration
from peft import LoraConfig, PeftModel, get_peft_model
model_id = "Qwen/Qwen2-VL-2B-Instruct"
path = "/tmp/peft/38627"
# Load processor and model
model = Qwen2VLForConditionalGeneration.from_pretrained(model_id)
if not os.path.exists(path + "/adapter_model.safetensors"):
# Define LoRA configuration
lora_config = LoraConfig(target_modules=["q_proj", "v_proj"], init_lora_weights=False)
# Create PEFT model
peft_model = get_peft_model(model, lora_config)
peft_model.save_pretrained(path)
print("no existing PEFT model found, created and saved new one at", path)
else:
# Load existing PEFT model
peft_model = PeftModel.from_pretrained(model, path)
print("loaded existing PEFT model from", path)Indeed, there are now issues when the version used to save the PEFT checkpoint and to load it are not identical:
To explain why this happens: PEFT adds new modules on top of the base model and also wraps it. So if the base model has a structure like I think what we would need to do in PEFT is to apply the same mapping as in this PR to the PEFT checkpoint. I'm not sure if we have access to |
|
Thanks for investigating ! In the main lib, conversion is done by hard coding a list of model for which we want to use their specified mapping attribute ( This might be a bit independent from this PR though since it's in PEFT code ? |
Yes, this is true, I prototyped a patch that does that and it indeed works, making #37033 backwards compatible with older checkpoints. However, this still leaves the problem of using a checkpoint from after #37033 with a transformers version from before (forwards compatibility). This doesn't work as the (inverse) mapping does not exist in older transformers versions. I also don't think that there is a nice way to detect this so that we can instruct users to upgrade transformers, we could only try to make guesses. IMO this is not very nice, as I imagine this could be a real issue for users. @zucchini-nlp do you have an idea how to address this?
Yes, sorry for spamming this PR. I can create a separate issue. |
|
Very cool ! Haha no problem at all ofc, I wasn't sure actually since I didn't quite look at the extent of the scope of the "integrations.peft" transformers code. Thanks a ton for looking into all of this ! |
Follow up to huggingface#2554 See discussion in huggingface/transformers#38627 To quote: > transformers PR #37033 re-arranges the way visual language models are built by moving the LM head from the language model to the top-level VLM (among other things). A consequence of this is that the keys in the PEFT state_dict now also follow the new architecture. This means that: 1. If a PEFT checkpoint was saved with the old architecture but is loaded with the new architecture, loading fails. 2. If a PEFT checkpoint was saved with the new architecture but is loaded with the old architecture, loading fails. 1. can be addressed by making use of the newly added _checkpoint_conversion_mapping attribute for models with the new architecture. In transformers, this is used to map old model state_dicts to the new state_dict format. In PEFT, with some fiddling, we can use the same mapping to make old PEFT state_dicts compatible with the new architecture (backwards compatibility). However, 2. is not easily addressed. We would need a reverse mapping for this. This could be easily derived from _checkpoint_conversion_mapping, but since this attribute doesn't exist on old models, we cannot do that. Therefore, new checkpoints created with PEFT on these models won't load successfully when users use old transformers (forward compatibility). These cases are covered by the added unit tests, which means that the test covering case 2 currently fails. If we could reliably detect that we are in case 2, we could warn the user and advise them to upgrade transformers, but I don't know if it's possible to figure this out. Note that we skip prompt learning methods when applying the mapping. This is because they don't have the "base_model.model." prefix, which we need to remove before mapping. Instead just using "base_model.". This could be fine, we could only remove "base_model.", However, the subsequent sub-module could also be called "model", resulting in what looks like "base_model.model.". To avoid this confusion, we skip prefix tuning. Since it should be applied to the language model part directly and applies itself on the outer model (unlike LoRA et al), skipping should be fine. We also allow users to pass their own key_mapping to from_pretrained and load_adapter, though the documentation advises against it. This argument could theoretically be used as a workaround in case there is indeed an issue with prompt learning state_dicts. Apart from these changes, I also made a small change to account for huggingface/transformers#38017 (comment).
|
I created a PR to address this in PEFT: huggingface/peft#2574. I used fake mini models to mimic the old and new model architectures for testing. The tests are still failing because of the issue I mentioned above. |
|
Hello @zucchini-nlp , any news ? |
|
Let's wait for a core maintainer's review, ping @Cyrilvallez :) |
There was a problem hiding this comment.
Hey! I must say I really like this, it's true that I did not think of the adapters when introducing key_mapping!
However, this does not solve the issue at hand in all cases: if people first load the model then use load_adapter (i.e. outside of from_pretrained) it won't work with vlms. So load_adapter should have the same default key_mapping as from_pretrained in general to keep everything consistent!
| model.hf_quantizer = hf_quantizer | ||
|
|
||
| if _adapter_model_path is not None: | ||
| adapter_kwargs["key_mapping"] = key_mapping |
There was a problem hiding this comment.
Hey! Indeed, key_mapping is public and supposed to be used at the user's discretion - however it's a niche arg, and we don't expect many people to use it. But it should still be propagated correctly in all cases as it's being done right now
IMO, vlms should have merged the potential mapping passed by the user to their own, instead of overriding it, but once again as we don't expect people to use the arg except in really particular situations, it's not a big issue 👌
|
Thanks a lot for taking a look @Cyrilvallez ! Do you have any further comments @zucchini-nlp ? |
zucchini-nlp
left a comment
There was a problem hiding this comment.
Nope, no comments. Thanks, let's merge!
FIX Transformers VLM architecture changes Follow up to #2554 See discussion in huggingface/transformers#38627 To quote: > transformers PR #37033 re-arranges the way visual language models are built by moving the LM head from the language model to the top-level VLM (among other things). A consequence of this is that the keys in the PEFT state_dict now also follow the new architecture. This means that: 1. If a PEFT checkpoint was saved with the old architecture but is loaded with the new architecture, loading fails. 2. If a PEFT checkpoint was saved with the new architecture but is loaded with the old architecture, loading fails. 1. can be addressed by making use of the newly added _checkpoint_conversion_mapping attribute for models with the new architecture. In transformers, this is used to map old model state_dicts to the new state_dict format. In PEFT, with some fiddling, we can use the same mapping to make old PEFT state_dicts compatible with the new architecture (backwards compatibility). However, 2. is not easily addressed. We would need a reverse mapping for this. This could be easily derived from _checkpoint_conversion_mapping, but since this attribute doesn't exist on old models, we cannot do that. Therefore, new checkpoints created with PEFT on these models won't load successfully when users use old transformers (forward compatibility). These cases are covered by the added unit tests, which means that the test covering case 2 are marked as xfail. If we could reliably detect that we are in case 2, we could warn the user and advise them to upgrade transformers, but I don't know if it's possible to figure this out. We also allow users to pass their own key_mapping to from_pretrained and load_adapter, though the documentation advises against it. This argument could theoretically be used as a workaround in case there is indeed an issue with prompt learning state_dicts. Apart from these changes, I also made a small change to account for huggingface/transformers#38017 (comment).
FIX Transformers VLM architecture changes Follow up to huggingface#2554 See discussion in huggingface/transformers#38627 To quote: > transformers PR #37033 re-arranges the way visual language models are built by moving the LM head from the language model to the top-level VLM (among other things). A consequence of this is that the keys in the PEFT state_dict now also follow the new architecture. This means that: 1. If a PEFT checkpoint was saved with the old architecture but is loaded with the new architecture, loading fails. 2. If a PEFT checkpoint was saved with the new architecture but is loaded with the old architecture, loading fails. 1. can be addressed by making use of the newly added _checkpoint_conversion_mapping attribute for models with the new architecture. In transformers, this is used to map old model state_dicts to the new state_dict format. In PEFT, with some fiddling, we can use the same mapping to make old PEFT state_dicts compatible with the new architecture (backwards compatibility). However, 2. is not easily addressed. We would need a reverse mapping for this. This could be easily derived from _checkpoint_conversion_mapping, but since this attribute doesn't exist on old models, we cannot do that. Therefore, new checkpoints created with PEFT on these models won't load successfully when users use old transformers (forward compatibility). These cases are covered by the added unit tests, which means that the test covering case 2 are marked as xfail. If we could reliably detect that we are in case 2, we could warn the user and advise them to upgrade transformers, but I don't know if it's possible to figure this out. We also allow users to pass their own key_mapping to from_pretrained and load_adapter, though the documentation advises against it. This argument could theoretically be used as a workaround in case there is indeed an issue with prompt learning state_dicts. Apart from these changes, I also made a small change to account for huggingface/transformers#38017 (comment).
FIX Transformers VLM architecture changes Follow up to huggingface#2554 See discussion in huggingface/transformers#38627 To quote: > transformers PR #37033 re-arranges the way visual language models are built by moving the LM head from the language model to the top-level VLM (among other things). A consequence of this is that the keys in the PEFT state_dict now also follow the new architecture. This means that: 1. If a PEFT checkpoint was saved with the old architecture but is loaded with the new architecture, loading fails. 2. If a PEFT checkpoint was saved with the new architecture but is loaded with the old architecture, loading fails. 1. can be addressed by making use of the newly added _checkpoint_conversion_mapping attribute for models with the new architecture. In transformers, this is used to map old model state_dicts to the new state_dict format. In PEFT, with some fiddling, we can use the same mapping to make old PEFT state_dicts compatible with the new architecture (backwards compatibility). However, 2. is not easily addressed. We would need a reverse mapping for this. This could be easily derived from _checkpoint_conversion_mapping, but since this attribute doesn't exist on old models, we cannot do that. Therefore, new checkpoints created with PEFT on these models won't load successfully when users use old transformers (forward compatibility). These cases are covered by the added unit tests, which means that the test covering case 2 are marked as xfail. If we could reliably detect that we are in case 2, we could warn the user and advise them to upgrade transformers, but I don't know if it's possible to figure this out. We also allow users to pass their own key_mapping to from_pretrained and load_adapter, though the documentation advises against it. This argument could theoretically be used as a workaround in case there is indeed an issue with prompt learning state_dicts. Apart from these changes, I also made a small change to account for huggingface/transformers#38017 (comment).
FIX Transformers VLM architecture changes Follow up to #2554 See discussion in huggingface/transformers#38627 To quote: > transformers PR #37033 re-arranges the way visual language models are built by moving the LM head from the language model to the top-level VLM (among other things). A consequence of this is that the keys in the PEFT state_dict now also follow the new architecture. This means that: 1. If a PEFT checkpoint was saved with the old architecture but is loaded with the new architecture, loading fails. 2. If a PEFT checkpoint was saved with the new architecture but is loaded with the old architecture, loading fails. 1. can be addressed by making use of the newly added _checkpoint_conversion_mapping attribute for models with the new architecture. In transformers, this is used to map old model state_dicts to the new state_dict format. In PEFT, with some fiddling, we can use the same mapping to make old PEFT state_dicts compatible with the new architecture (backwards compatibility). However, 2. is not easily addressed. We would need a reverse mapping for this. This could be easily derived from _checkpoint_conversion_mapping, but since this attribute doesn't exist on old models, we cannot do that. Therefore, new checkpoints created with PEFT on these models won't load successfully when users use old transformers (forward compatibility). These cases are covered by the added unit tests, which means that the test covering case 2 are marked as xfail. If we could reliably detect that we are in case 2, we could warn the user and advise them to upgrade transformers, but I don't know if it's possible to figure this out. We also allow users to pass their own key_mapping to from_pretrained and load_adapter, though the documentation advises against it. This argument could theoretically be used as a workaround in case there is indeed an issue with prompt learning state_dicts. Apart from these changes, I also made a small change to account for huggingface/transformers#38017 (comment).
…but isn't expected as part of bert_model.load_adapter. references: huggingface/peft#3110 huggingface/transformers#38627
What does this PR do?
Fixes behavior introduced in 4.52 VLM refacto #37033 that renames VLM model weights but forgot about adapters.
(Theoretically, it seems the problem was just the non-propagation of keymapping to peft but this is made a lot more salient by recent big changes in VLM naming, which deprecates all existing adapters basically)
@zucchini-nlp
Note that this is just for loading, not sure if we want to generalize this to saving... I kind of feel we might just want people to save new adapters in the correct format going forward (and slowly convert all models to the new format while ensuring backcompat)