Skip to content

Gemma4 resizing per layer inputs#45324

Merged
zucchini-nlp merged 12 commits intohuggingface:mainfrom
zucchini-nlp:gemma4-resizing
Apr 15, 2026
Merged

Gemma4 resizing per layer inputs#45324
zucchini-nlp merged 12 commits intohuggingface:mainfrom
zucchini-nlp:gemma4-resizing

Conversation

@zucchini-nlp
Copy link
Copy Markdown
Member

@zucchini-nlp zucchini-nlp commented Apr 8, 2026

What does this PR do?

Fixes #45276 and #45335

In gemma4 per-layer inputs have to be resized as long as they aren't part of soft multimodal tokens

Repro for T5 gemma:

from transformers import T5GemmaForConditionalGeneration

model = T5GemmaForConditionalGeneration.from_pretrained("harshaljanjani/tiny-t5gemma-test")
encoder = model.resize_token_embeddings(model.vocab_size + 10)
decoder = model.model.decoder.embed_tokens
head = model.get_output_embeddings()
print(encoder.weight.shape, decoder.weight.shape, head.weight.shape)
# LM head resize is reverted back because of tying. Decoder is never resized
>>> torch.Size([256010, 512]) torch.Size([256000, 512]) torch.Size([256000, 512])

Gemma3n has soft mm tokens, and the current state of 3N is not good. I see unused vocab entries in mm_projection 😢 and if when apply simply resizing to per-layer-input, we'll get even more unused entries

Could be done in the correct way if we filter out mm-tokens, but I'd prefer to leave it for now

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Comment on lines -963 to -968
def get_input_embeddings(self):
return self.model.get_input_embeddings()

def set_input_embeddings(self, value):
self.model.set_input_embeddings(value)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as base class, so no need to override

Comment on lines +964 to +968
# The tying happens from decoder to lm-head, but when resizing
# the resized embed is assigned only to the head. Then tying weights
# again reverts everything back. So we have to update decoder here
if self.config.tie_word_embeddings:
self.model.decoder.embed_tokens = new_embeddings
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a fan of it tbh, but ig it's better than overriding resize

Comment on lines +2158 to +2160
# Input ids should be expanded to the new maximum size of the vocabulary
inputs_dict["input_ids"][:, -2] = new_model_vocab_size - 1

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we had this, we'd know that gemma4 resize doesn't work well. Added now

Comment thread src/transformers/models/t5gemma/modeling_t5gemma.py
@zucchini-nlp
Copy link
Copy Markdown
Member Author

@bot /repo

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 9, 2026

Repo. Consistency bot fixed some files and pushed the changes.

Copy link
Copy Markdown
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM, its a bit dirty to have to call add hook to module but model specific so fine b y me!

@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: blip, colmodernvbert, gemma3, gemma3n, gemma4, lfm2_vl, paligemma, qwen3_vl, qwen3_vl_moe, t5gemma

@zucchini-nlp zucchini-nlp added this pull request to the merge queue Apr 15, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Apr 15, 2026
@zucchini-nlp zucchini-nlp added this pull request to the merge queue Apr 15, 2026
Merged via the queue into huggingface:main with commit b6f9463 Apr 15, 2026
28 checks passed
@zucchini-nlp zucchini-nlp deleted the gemma4-resizing branch April 15, 2026 11:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[gemma4] resize_token_embeddings does not effect to embed_tokens_per_layer or output_embeddings

4 participants