Gemma4 resizing per layer inputs by zucchini-nlp · Pull Request #45324 · huggingface/transformers

zucchini-nlp · 2026-04-08T17:06:26Z

What does this PR do?

In gemma4 per-layer inputs have to be resized as long as they aren't part of soft multimodal tokens

Repro for T5 gemma:

from transformers import T5GemmaForConditionalGeneration

model = T5GemmaForConditionalGeneration.from_pretrained("harshaljanjani/tiny-t5gemma-test")
encoder = model.resize_token_embeddings(model.vocab_size + 10)
decoder = model.model.decoder.embed_tokens
head = model.get_output_embeddings()
print(encoder.weight.shape, decoder.weight.shape, head.weight.shape)
# LM head resize is reverted back because of tying. Decoder is never resized
>>> torch.Size([256010, 512]) torch.Size([256000, 512]) torch.Size([256000, 512])

Gemma3n has soft mm tokens, and the current state of 3N is not good. I see unused vocab entries in mm_projection 😢 and if when apply simply resizing to per-layer-input, we'll get even more unused entries

Could be done in the correct way if we filter out mm-tokens, but I'd prefer to leave it for now

HuggingFaceDocBuilderDev · 2026-04-08T17:16:24Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

zucchini-nlp · 2026-04-09T13:29:02Z

-    def get_input_embeddings(self):
-        return self.model.get_input_embeddings()
-
-    def set_input_embeddings(self, value):
-        self.model.set_input_embeddings(value)
-


same as base class, so no need to override

zucchini-nlp · 2026-04-09T13:29:41Z

+        # The tying happens from decoder to lm-head, but when resizing
+        # the resized embed is assigned only to the head. Then tying weights
+        # again reverts everything back. So we have to update decoder here
+        if self.config.tie_word_embeddings:
+            self.model.decoder.embed_tokens = new_embeddings


not a fan of it tbh, but ig it's better than overriding resize

zucchini-nlp · 2026-04-09T13:30:20Z

+                # Input ids should be expanded to the new maximum size of the vocabulary
+                inputs_dict["input_ids"][:, -2] = new_model_vocab_size - 1
+


if we had this, we'd know that gemma4 resize doesn't work well. Added now

zucchini-nlp · 2026-04-09T16:22:07Z

@bot /repo

github-actions · 2026-04-09T16:22:46Z

Repo. Consistency bot fixed some files and pushed the changes.

ArthurZucker

SGTM, its a bit dirty to have to call add hook to module but model specific so fine b y me!

github-actions · 2026-04-14T16:14:48Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: blip, colmodernvbert, gemma3, gemma3n, gemma4, lfm2_vl, paligemma, qwen3_vl, qwen3_vl_moe, t5gemma

zucchini-nlp added 2 commits April 8, 2026 18:39

gemma4

180a0dd

mask out per-layer input correctly!

304acd6

zucchini-nlp added 5 commits April 9, 2026 12:29

revert gemma3n and add t5gemma

31bb636

fix repo

02e4218

the test

bbbd434

fix some tests

431da57

gemma3n also has a causal model

8cc8f93

zucchini-nlp commented Apr 9, 2026

View reviewed changes

KoichiYasuoka reviewed Apr 9, 2026

View reviewed changes

Comment thread src/transformers/models/t5gemma/modeling_t5gemma.py

fix repo

a1f1f87

zucchini-nlp requested a review from ArthurZucker April 9, 2026 15:51

Merge branch 'main' into gemma4-resizing

8de3b7a

Apply repo consistency fixes

295d36d

KoichiYasuoka mentioned this pull request Apr 10, 2026

[t5gemma] resize_token_embeddings does not effect to decoder.embed_tokens #45335

Closed

4 tasks

ArthurZucker approved these changes Apr 13, 2026

View reviewed changes

zucchini-nlp added 2 commits April 14, 2026 18:01

Merge remote-tracking branch 'upstream/main' into gemma4-resizing

3024c84

update attr even if weights are tied

f7efff6

zucchini-nlp added this pull request to the merge queue Apr 15, 2026

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Apr 15, 2026

zucchini-nlp added this pull request to the merge queue Apr 15, 2026

Merged via the queue into huggingface:main with commit b6f9463 Apr 15, 2026
28 checks passed

zucchini-nlp deleted the gemma4-resizing branch April 15, 2026 11:15

KoichiYasuoka mentioned this pull request Apr 18, 2026

resize_token_embeddings does not effect to output_embeddings #45311

Closed

6 tasks

evalstate mentioned this pull request Apr 28, 2026

Cumulative defect fixes from recent Transformers PRs evalstate/transformers#41

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gemma4 resizing per layer inputs#45324

Gemma4 resizing per layer inputs#45324
zucchini-nlp merged 12 commits intohuggingface:mainfrom
zucchini-nlp:gemma4-resizing

zucchini-nlp commented Apr 8, 2026 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Apr 8, 2026

Uh oh!

zucchini-nlp Apr 9, 2026

Uh oh!

zucchini-nlp Apr 9, 2026

Uh oh!

zucchini-nlp Apr 9, 2026

Uh oh!

Uh oh!

zucchini-nlp commented Apr 9, 2026

Uh oh!

github-actions Bot commented Apr 9, 2026 •

edited

Loading

Uh oh!

ArthurZucker left a comment

Uh oh!

github-actions Bot commented Apr 14, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		# Input ids should be expanded to the new maximum size of the vocabulary
		inputs_dict["input_ids"][:, -2] = new_model_vocab_size - 1

Conversation

zucchini-nlp commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Apr 8, 2026

Uh oh!

zucchini-nlp Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zucchini-nlp commented Apr 9, 2026

Uh oh!

github-actions Bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 14, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

zucchini-nlp commented Apr 8, 2026 •

edited

Loading

github-actions Bot commented Apr 9, 2026 •

edited

Loading