[gemma4] Remove all shared weights, and silently skip them during loading#45336
[gemma4] Remove all shared weights, and silently skip them during loading#45336Cyrilvallez merged 5 commits intomainfrom
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
vasqu
left a comment
There was a problem hiding this comment.
Some initial comments, 1 nit re ternaries (I don't think is easy to change because of the v proj) and 1 bigger point on the tied weights keys: Imo, we should have this as property under the pretrained model directly and avoid it in all those inits; especially since we can exctract them from the config 🤔 Maybe I'm overseeing something tho
| # Update `_keys_to_ignore_on_load_unexpected` to drop all k/v proj and norms for the shared layers | ||
| self._keys_to_ignore_on_load_unexpected = [] | ||
| for i, layer in enumerate(self.layers): | ||
| if layer.self_attn.is_kv_shared_layer: | ||
| self._keys_to_ignore_on_load_unexpected.extend( | ||
| [f"layers.{i}.self_attn.{name}" for name in ("k_proj", "v_proj", "k_norm", "v_norm")] | ||
| ) |
There was a problem hiding this comment.
Hmm, not a fan of having this in the init. It relies on the layers to be constructed but imo it seems to be fully extractable from the config no? Wdyt about making a property out of this that relies on the config or is there anything that would be dangerous with that approach?
There was a problem hiding this comment.
So a property within the base pretrained model
There was a problem hiding this comment.
It needs to read the config, so cannot be a class attribute, and PreTrainedModels don't have init usually! Anyway it's temporary, as it will later be dropped from the hub completely 👌
| # Grab the ones from the child | ||
| self._keys_to_ignore_on_load_unexpected = [ | ||
| f"model.{name}" for name in self.model._keys_to_ignore_on_load_unexpected | ||
| ] |
There was a problem hiding this comment.
Then we can remove all these parent versions imo. It should handle regex so we don't need full keys, no?
There was a problem hiding this comment.
Just to make sure that any model instantiated will have the attrs, as we do not inherit automatically those from children yet (I'm planning on this, and already started a bit)
|
[For maintainers] Suggested jobs to run (before merge) run-slow: gemma4 |
vasqu
left a comment
There was a problem hiding this comment.
Discussed internally
- Property doesnt work because especially since not all model variations need it, e.g. audio
- It's only temporary until all other libraries include the fix + hub update
- Quantize TP is weird but not high prio
…ding (#45336) * drop weights silently * typo * style * skip weird test
…ding (huggingface#45336) * drop weights silently * typo * style * skip weird test
What does this PR do?
As per the title. Follow-up of #45312.
This removes the unnecessary weights, and silently skip them during loading, so that the checkpoints on the hub do not have to be changed for now.