[gemma4] Dissociate kv states sharing from the Cache by Cyrilvallez · Pull Request #45312 · huggingface/transformers

Cyrilvallez · 2026-04-08T11:33:33Z

What does this PR do?

As per the title. It was confirmed that the weight matrices of shared layers are NEVER used, and that kv states should ALWAYS be shared, even during training or inference without Cache.
I will fully remove them on another PR, as they consume memory for no reason.

HuggingFaceDocBuilderDev · 2026-04-08T11:43:22Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

zucchini-nlp

Thanks, ig we have to disable GC for model and the same fix is needed for Gemma3n

github-actions · 2026-04-08T16:35:10Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: gemma4

* force at least cache sharing always * better * adjust tests gradients * fix mask * fix * fix * simplify a lot * add slow test * fix * doc * improve type hints

…ments - transformers 5.5.3 includes the Gemma 4 KV sharing fix (huggingface/transformers#45312) that caused all our training failures - Also updated: datasets 4.8.4, sentence-transformers 5.4.0, wandb 0.25.1 - Removed unused: outlines, flash-linear-attention, causal-conv1d and deps - Updated comments to reference the upstream fix while keeping our TrainingCache workaround as a safety net Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* force at least cache sharing always * better * adjust tests gradients * fix mask * fix * fix * simplify a lot * add slow test * fix * doc * improve type hints

- Minimize DAPO Gemma4 E2B-it recipe to satisfy configs-minimize-check hook. - Guard `_needs_kv_cache_for_shared_layers` against non-int `num_kv_shared_layers` (fixes `TypeError` when tests pass a bare MagicMock as the model). Noted in a TODO that this workaround can be removed once transformers>=5.5.2 (huggingface/transformers#45312) lands. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Shuang Yu <shuangy@nvidia.com>

force at least cache sharing always

42aa69f

Cyrilvallez changed the title ~~Fix Gemma4 cache sharing with no_cache~~ Fix Gemma4 layer cache sharing with no_cache Apr 8, 2026

Cyrilvallez changed the title ~~Fix Gemma4 layer cache sharing with no_cache~~ Fix Gemma4 layer cache sharing with use_cache=False Apr 8, 2026

zucchini-nlp approved these changes Apr 8, 2026

View reviewed changes

Cyrilvallez and others added 9 commits April 8, 2026 15:07

Merge branch 'main' into no-cache-gemma4

47bd0a2

better

beb3d15

adjust tests gradients

2ab838e

Merge branch 'main' into no-cache-gemma4

36666b7

fix mask

b124525

fix

00dc23e

fix

f9d70a2

simplify a lot

fbe5216

add slow test

a7af04f

Cyrilvallez added 3 commits April 8, 2026 18:36

fix

de5d0d1

doc

9c64da7

improve type hints

0a833fa

Cyrilvallez changed the title ~~Fix Gemma4 layer cache sharing with use_cache=False~~ [gemma4] Dissociate kv states sharing from the Cache Apr 8, 2026

Cyrilvallez merged commit d42f8ba into main Apr 9, 2026
21 of 23 checks passed

Cyrilvallez deleted the no-cache-gemma4 branch April 9, 2026 08:08

This was referenced Apr 13, 2026

feat: Gemma 4 E2B spoke training — 25/25 schema compliance AppSprout-dev/mnemonic#400

Merged

Continuous learning: encoding model that improves from operational experience AppSprout-dev/mnemonic#391

Closed

Cyrilvallez mentioned this pull request Apr 17, 2026

Align gemma3n cache sharing to gemma4 #45489

Merged

evalstate mentioned this pull request Apr 28, 2026

Cumulative defect fixes from recent Transformers PRs evalstate/transformers#41

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[gemma4] Dissociate kv states sharing from the Cache#45312

[gemma4] Dissociate kv states sharing from the Cache#45312
Cyrilvallez merged 13 commits intomainfrom
no-cache-gemma4

Cyrilvallez commented Apr 8, 2026 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Apr 8, 2026

Uh oh!

zucchini-nlp left a comment

Uh oh!

github-actions Bot commented Apr 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Cyrilvallez commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Apr 8, 2026

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Cyrilvallez commented Apr 8, 2026 •

edited

Loading