Skip to content

[gemma4] Remove all shared weights, and silently skip them during loading#45336

Merged
Cyrilvallez merged 5 commits intomainfrom
drop-gemma4-weights
Apr 9, 2026
Merged

[gemma4] Remove all shared weights, and silently skip them during loading#45336
Cyrilvallez merged 5 commits intomainfrom
drop-gemma4-weights

Conversation

@Cyrilvallez
Copy link
Copy Markdown
Member

What does this PR do?

As per the title. Follow-up of #45312.
This removes the unnecessary weights, and silently skip them during loading, so that the checkpoints on the hub do not have to be changed for now.

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Copy Markdown
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments, 1 nit re ternaries (I don't think is easy to change because of the v proj) and 1 bigger point on the tied weights keys: Imo, we should have this as property under the pretrained model directly and avoid it in all those inits; especially since we can exctract them from the config 🤔 Maybe I'm overseeing something tho

Comment thread src/transformers/models/gemma4/modeling_gemma4.py
Comment on lines +1541 to +1547
# Update `_keys_to_ignore_on_load_unexpected` to drop all k/v proj and norms for the shared layers
self._keys_to_ignore_on_load_unexpected = []
for i, layer in enumerate(self.layers):
if layer.self_attn.is_kv_shared_layer:
self._keys_to_ignore_on_load_unexpected.extend(
[f"layers.{i}.self_attn.{name}" for name in ("k_proj", "v_proj", "k_norm", "v_norm")]
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, not a fan of having this in the init. It relies on the layers to be constructed but imo it seems to be fully extractable from the config no? Wdyt about making a property out of this that relies on the config or is there anything that would be dangerous with that approach?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So a property within the base pretrained model

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs to read the config, so cannot be a class attribute, and PreTrainedModels don't have init usually! Anyway it's temporary, as it will later be dropped from the hub completely 👌

Comment on lines +1712 to +1715
# Grab the ones from the child
self._keys_to_ignore_on_load_unexpected = [
f"model.{name}" for name in self.model._keys_to_ignore_on_load_unexpected
]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we can remove all these parent versions imo. It should handle regex so we don't need full keys, no?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure that any model instantiated will have the attrs, as we do not inherit automatically those from children yet (I'm planning on this, and already started a bit)

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 9, 2026

[For maintainers] Suggested jobs to run (before merge)

run-slow: gemma4

Copy link
Copy Markdown
Contributor

@vasqu vasqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed internally

  • Property doesnt work because especially since not all model variations need it, e.g. audio
  • It's only temporary until all other libraries include the fix + hub update
  • Quantize TP is weird but not high prio

@Cyrilvallez Cyrilvallez merged commit 9f8ddaa into main Apr 9, 2026
21 checks passed
@Cyrilvallez Cyrilvallez deleted the drop-gemma4-weights branch April 9, 2026 13:23
Cyrilvallez added a commit that referenced this pull request Apr 9, 2026
…ding (#45336)

* drop weights silently

* typo

* style

* skip weird test
sirzechs66 pushed a commit to sirzechs66/transformers that referenced this pull request Apr 18, 2026
…ding (huggingface#45336)

* drop weights silently

* typo

* style

* skip weird test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants