Make gradient-checkpoint enabling tolerant of models without get_input_embeddings by molbap · Pull Request #42558 · huggingface/transformers

molbap · 2025-12-02T14:17:07Z

What does this PR do?

Ad title indicates, #42542 and likely a few other models are broken by merged #41993 . This adds an embedding getter and attempts to test the feature with more coverage.

Basically what it does

Stop hard-failing gradient_checkpointing_enable when a model lacks a get_input_embeddings. We now just call enable_input_require_grads, let it attach hooks where it can, and issue a single warning if no embedding module is found.
Simplify enable_input_require_grads (and the InternVL/MLCD and a couple more model overrides/adjustments) by making them responsible for the warning.
Adds a big test to make sure all of that works (please take a look)

Should help GC for PEFT adapters for many VLMs hopefully (and normal models too)

HuggingFaceDocBuilderDev · 2025-12-02T14:26:43Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

molbap · 2025-12-02T18:21:31Z

added a test as well - but can't find a clean way around the models for which it is not relevant to have a getter method and not causing as many side-effects. WDYT @zucchini-nlp ? kind of stumped (try/excepting at higher level would always work but hides a lot)

zucchini-nlp

models for which it is not relevant to have a getter method and not causing as many side-effects

Would this also mean that we can't support correctly PEFT and GC with these models, or do they have a custom way to set grad on the inputs? We could raise an error with a better message saying that models doesn't support unless it has a way to get its input embeddings, wdyt?

zucchini-nlp · 2025-12-03T09:29:05Z

+        base_model = getattr(self, "base_model_prefix", None)
+        if base_model is not None:
+            base_model = getattr(self, base_model, None)


nit: self.base_model property has the same functionality

zucchini-nlp · 2025-12-03T09:43:48Z

    _input_embed_layer = "embed_tokens"  # default layer that holds input embeddings.

-    def get_input_embeddings(self) -> nn.Module:
+    def _get_input_embeddings_no_raise(self) -> Optional[nn.Module]:


oh interesting, I was assuming the base get_input_embedding already returns None

well I ended up in some many little edge cases lol

molbap · 2025-12-03T10:58:58Z

Yes it's a good idea to raise/inform for downstream users. I reverted a couple things and will update the test so it actually checks that enabling GC works (probably add another test)

molbap · 2025-12-04T15:11:37Z

This is mostly to fix a broken env situation that can be caused around timm_wrapper (or timm_backbone?) so it protects a few imports

molbap · 2025-12-04T15:26:26Z

I reverted a few models to inner positional embeddings calls as mentioned in #38913 .

Modified a few others models as the test I added (test_enable_input_require_grads_with_gradient_checkpointing ) was a bit naive and I was just continue-ing, now it's a proper skip if the loss is undefined.

Hopefully that helps VLMs + GC and does not break adapters

molbap · 2025-12-04T16:29:34Z

+            try:
+                input_embeddings = module.get_input_embeddings()
+            except NotImplementedError:
+                continue


no simple way around this unfortunately

oke, I think with the warning below, it is more explicit

molbap · 2025-12-04T16:29:48Z

+        if not found_embeddings:
+            logger.warning_once(
+                f"{self.__class__.__name__} does not expose input embeddings. Gradients cannot flow back to the token "
+                "embeddings when using adapters or gradient checkpointing. Override `get_input_embeddings` to fully "
+                "support those features."


at least we can warn users!

zucchini-nlp

It is a pity that there are so many edge cases to handle. Raising a warning seems like a good solution to not silently skip exceptions

I think there are some unrelated changes with rope, let's revert those before merging

zucchini-nlp · 2025-12-05T12:16:33Z

+        rotary_embeddings = position_embeddings
+        if rotary_position_tensor is not None:
+            rotary_embeddings = (rotary_position_tensor.cos(), rotary_position_tensor.sin())
+


the position_embeddings are already supposed to be present so we don't need the embeddings, isn't it?

that one is a little bit more annoying. I'm trying to revert it but the thing is this model should work except that it's a tuple... and gradient checkpointing does not disable grads on tuples, only on tensors passed as pos args .

so this was a hacky trick (for the use_reentrant case

weird, we use tuple cos/sin in most LLMs. I'd prefer to skip this model's test instead of fixing by duplicate args, i think it is not used as commonly

I did skip it yes because it was being annoying 😁 and yes we use them, but they don't usually carry grads, here (in that particular model) they do

zucchini-nlp · 2025-12-05T12:21:50Z

+        if not found_embeddings:
+            logger.warning_once(
+                f"{self.__class__.__name__} does not expose input embeddings. Gradients cannot flow back to the token "
+                "embeddings when using adapters or gradient checkpointing. Override `get_input_embeddings` to fully "
+                "support those features."


zucchini-nlp · 2025-12-05T12:23:18Z

+            try:
+                input_embeddings = module.get_input_embeddings()
+            except NotImplementedError:
+                continue


oke, I think with the warning below, it is more explicit

zucchini-nlp · 2025-12-05T12:27:38Z

+        needs_embedding_grads = self.main_input_name == "input_ids"
+        # we use that also to detect whether or not we have to raise if embeddings are missing (the submodel might not have embeddings at all)
+        enable_input_grads = needs_embedding_grads or getattr(self, "_hf_peft_config_loaded", False)
+        if enable_input_grads:


hmm, for my understanding, why do we always need to enable grads when doing GC training with text models?

we don't always, but we do with reentrant checkpointing. IIUC it's not to actualy use these gradients, it's that torch.utils.checkpoint needs at least one input and one output to actually have gradients, else the checkpointed part will not have a gradient.

zucchini-nlp · 2025-12-05T12:36:45Z

+    def test_enable_input_require_grads_with_gradient_checkpointing(self):
+        if not getattr(self.model_tester, "is_training", False):
+            self.skipTest(reason="ModelTester is not configured to run training tests")
+
+        config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common()
+        if hasattr(config, "use_cache"):
+            config.use_cache = False
+
+        has_verified_model = False
+
+        for model_class in self.all_model_classes:
+            if not getattr(model_class, "supports_gradient_checkpointing", False):


I see now what you meant earlier, this test has a lot of edge cases

yes, it's a bit clunky to have this bool flag but wasn't seeing a simpler option

zucchini-nlp

Thanks again for handling all edge cases, was not an easy one

molbap · 2025-12-05T16:08:19Z

run-slow: align, altclip, chinese_clip, clap, clvp, falcon_mamba, fast_vlm, internvl, layoutlm, layoutlmv3, lilt, mamba, markuplm, mlcd, poolformer, siglip

github-actions · 2025-12-05T16:10:28Z

This comment contains run-slow, running the specified jobs:

models: ["models/align", "models/altclip", "models/chinese_clip", "models/clap", "models/clvp", "models/falcon_mamba", "models/fast_vlm", "models/internvl", "models/layoutlm", "models/layoutlmv3", "models/lilt", "models/mamba", "models/markuplm", "models/mlcd", "models/poolformer", "models/siglip"]
quantizations: []

github-actions · 2025-12-05T16:33:15Z

CI Results

Workflow Run ⚙️

✅ No failing test specific to this PR 🎉 !

ArthurZucker

Kudos very nice PR!

ArthurZucker · 2025-12-08T09:30:46Z

+                "embeddings when using adapters or gradient checkpointing. Override `get_input_embeddings` to fully "
+                "support those features."


either that or sometimes just add a _input_embedding_layer

i mean update the message please

ArthurZucker · 2025-12-08T09:31:43Z

+            if not hasattr(model, "get_input_embeddings"):
+                continue


why not raise an error instead this way all new models wiil make sure they have this go green before merging?

forgot to answer but: this would currently raise for many existing models

ArthurZucker · 2025-12-08T09:32:13Z

+        if not getattr(self.model_tester, "is_training", False):
+            self.skipTest(reason="ModelTester is not configured to run training tests")


If this one is True by default for all models sg

yes AFAIK, true for CausalLMTester

ArthurZucker

First of all thanks!
This will also fix some TP recompile issues cc @3outeille on hidden_states=hidden_states

ArthurZucker · 2025-12-10T09:22:38Z

+            logger.warning_once(
+                f"{self.__class__.__name__} does not expose input embeddings. Gradients cannot flow back to the token "
+                "embeddings when using adapters or gradient checkpointing. Override `get_input_embeddings` to fully "
+                "support those features."


Suggested change

"support those features."

"support those features, or add the `_input_embedding_layer` attribut with the name of the embedding layer!"

ArthurZucker · 2025-12-10T09:24:15Z

+            grad_after_gc = embedding_param.grad
+            self.assertIsNotNone(
+                grad_after_gc,
+                f"{model_class.__name__} should produce embedding gradients when gradient checkpointing is enabled.",


if you have an idea of what could cause this to fail, add it!

ArthurZucker · 2025-12-10T09:24:33Z

+                f"{model_class.__name__} produced non-finite gradients with gradient checkpointing enabled.",
+            )
+            self.assertGreater(
+                grad_after_gc.abs().sum().item(),
+                0,
+                f"{model_class.__name__} should keep non-zero embedding gradients with gradient checkpointing enabled.",
+            )
+            has_verified_model = True
+
+        if not has_verified_model:
+            self.skipTest(
+                reason="No model with a differentiable loss was available to verify enable_input_require_grads with gradient checkpointing."
+            )


same for each one of these, let's help the use to fix it!

ArthurZucker

Very nice! small updates and let's merge

ArthurZucker · 2025-12-17T10:33:23Z

+                "embeddings when using adapters or gradient checkpointing. Override `get_input_embeddings` to fully "
+                "support those features."


i mean update the message please

github-actions · 2025-12-17T16:12:04Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: align, altclip, chinese_clip, clap, clvp, falcon_mamba, fast_vlm, internvl, layoutlm, layoutlmv3, lilt, mamba, markuplm, mlcd, poolformer, siglip

molbap · 2025-12-17T16:13:37Z

run-slow: align, altclip, chinese_clip, clap, clvp, falcon_mamba, fast_vlm, internvl, layoutlm, layoutlmv3, lilt, mamba, markuplm, mlcd, poolformer, siglip

github-actions · 2025-12-17T16:14:46Z

This comment contains run-slow, running the specified jobs:

models: ["models/align", "models/altclip", "models/chinese_clip", "models/clap", "models/clvp", "models/falcon_mamba", "models/fast_vlm", "models/internvl", "models/layoutlm", "models/layoutlmv3", "models/lilt", "models/mamba", "models/markuplm", "models/mlcd", "models/poolformer", "models/siglip"]
quantizations: []

molbap · 2025-12-17T16:23:13Z

Comments addressed. Merging and keeping an eye on this 👀 let's see if something breaks and how

github-actions · 2025-12-17T16:29:45Z

CI Results

Workflow Run ⚙️

✅ No failing test specific to this PR 🎉 !

…t_embeddings (huggingface#42558) * add embedding getter * modify your own logic * a common test * some adapters are not PreTrainedModel s * few fixes * implement correct-ish fix? * fixup * this is needed likely * woops * solving some cross-imports issues here and there * more ximports issues * finally * revert changes * fixups * improve message * add common tests for input_ids first * increase test coverage * bigger update for GC * copies * mlcd is getting on my nerves a bit * ah yes * for BC * break a couple modelings * simplify with base_model * fix copies for torch checkpointing * simplify this model * improve messages

add embedding getter

d5909a7

molbap added 7 commits December 2, 2025 17:46

modify your own logic

a9eb634

a common test

b520cc7

some adapters are not PreTrainedModel s

7ce45fe

few fixes

d41e204

implement correct-ish fix?

0e93a61

fixup

de8ff71

this is needed likely

b2618b3

woops

5d61150

zucchini-nlp reviewed Dec 3, 2025

View reviewed changes

molbap added 5 commits December 3, 2025 11:32

solving some cross-imports issues here and there

ef55499

more ximports issues

44ab4c6

finally

fe89c1c

revert changes

2920d00

fixups

b8ccd0f

molbap added 6 commits December 3, 2025 14:47

improve message

b5ae5a6

add common tests for input_ids first

d209ff5

increase test coverage

79665d4

Merge branch 'main' into fix_enable_grads_again

844c707

bigger update for GC

fcc84a4

copies

e970fad

molbap commented Dec 4, 2025

View reviewed changes

molbap added 2 commits December 4, 2025 16:24

mlcd is getting on my nerves a bit

b4f5c15

ah yes

0246a70

for BC

81940dd

molbap commented Dec 4, 2025

View reviewed changes

molbap changed the title ~~Add embedding getter + test~~ Make gradient-checkpoint enabling tolerant of models without get_input_embeddings Dec 4, 2025

molbap added 3 commits December 5, 2025 12:20

break a couple modelings

284189a

Merge branch 'main' into fix_enable_grads_again

1079eef

simplify with base_model

f479598

zucchini-nlp reviewed Dec 5, 2025

View reviewed changes

molbap added 2 commits December 5, 2025 15:00

fix copies for torch checkpointing

73b4f5d

simplify this model

0e7086f

zucchini-nlp approved these changes Dec 5, 2025

View reviewed changes

molbap requested a review from ArthurZucker December 5, 2025 16:34

ArthurZucker approved these changes Dec 8, 2025

View reviewed changes

ArthurZucker mentioned this pull request Dec 8, 2025

Outstanding issues / PR before we can release v5 #42710

Closed

ArthurZucker approved these changes Dec 10, 2025

View reviewed changes

ArthurZucker approved these changes Dec 17, 2025

View reviewed changes

molbap added 3 commits December 17, 2025 14:39

Merge branch 'main' into fix_enable_grads_again

18d44ba

improve messages

00cc669

Merge branch 'main' into fix_enable_grads_again

d9d7442

molbap merged commit b712a97 into main Dec 17, 2025
27 checks passed

molbap deleted the fix_enable_grads_again branch December 17, 2025 16:30

qgallouedec mentioned this pull request Dec 29, 2025

PEFT training with gradient checkpointing fails #42489

Closed

4 tasks

coderabbitai Bot mentioned this pull request Jan 8, 2026

Transformers v5 rc02 axolotl-ai-cloud/axolotl#3347

Merged

		"embeddings when using adapters or gradient checkpointing. Override `get_input_embeddings` to fully "
		"support those features."

		if not getattr(self.model_tester, "is_training", False):
		self.skipTest(reason="ModelTester is not configured to run training tests")

	"support those features."
	"support those features, or add the `_input_embedding_layer` attribut with the name of the embedding layer!"

Conversation

molbap commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Dec 2, 2025

Uh oh!

molbap commented Dec 2, 2025

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

molbap commented Dec 3, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

molbap commented Dec 4, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

molbap commented Dec 5, 2025

Uh oh!

github-actions Bot commented Dec 5, 2025

Uh oh!

github-actions Bot commented Dec 5, 2025

CI Results

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

molbap commented Dec 2, 2025 •

edited

Loading