🚨 Generation config defaults are now `None` by zucchini-nlp · Pull Request #42702 · huggingface/transformers

zucchini-nlp · 2025-12-08T14:09:03Z

What does this PR do?

As per title, must have been done long time ago but could't break BC. The current impl breaks BC only half-way, i.e. the generation loop is not affected and will keep using the old defaults. The biggest difference is for users to init, access, modify, etc. the model's generation config directly:

# `0` before this PR, `None` after the PR
print(model.generation_config.no_repeat_ngram_size)

HuggingFaceDocBuilderDev · 2025-12-08T14:28:11Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

zucchini-nlp · 2025-12-10T12:34:59Z

+        generation_params = {}
+        default_config = self.__class__().to_dict() if not self.has_no_defaults_at_init else {}
+        for key in GenerationConfig._get_default_generation_params().keys():
+            if hasattr(self, key) and getattr(self, key) is not None and key not in default_config:
+                generation_params[key] = getattr(self, key)
+


could have been simplified because we no longer have any generation params in model.config

Just for my understanding, generation config is top level either way. Hence, we no longer need to discern for submodels etc in composite models (which could have possibly different config values here)

yeah, that and also because these lines are more of a workaround for old models (e.g. bart). New models don't have any generation params in model config anyway, we don't allow it for quite a long time

Gotcha, makes sense to me

zucchini-nlp · 2025-12-10T12:39:34Z

Ready for review!

vasqu

Let's still add an 🚨 even if it's not completely breaking, I rather be safe than sorry here. We never know

First round of comments, my biggest issue would be the kwargs vs generation config passing. But you also left a note there.

vasqu · 2025-12-10T15:45:26Z

+        generation_params = {}
+        default_config = self.__class__().to_dict() if not self.has_no_defaults_at_init else {}
+        for key in GenerationConfig._get_default_generation_params().keys():
+            if hasattr(self, key) and getattr(self, key) is not None and key not in default_config:
+                generation_params[key] = getattr(self, key)
+


Just for my understanding, generation config is top level either way. Hence, we no longer need to discern for submodels etc in composite models (which could have possibly different config values here)

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

zucchini-nlp · 2025-12-11T11:12:54Z

run-slow: bart, csm, dia, encoder_decoder, musicgen, rag, reformer, speech_encoder_decoder, vision_encoder_decoder, whisper

github-actions · 2025-12-11T11:14:06Z

This comment contains run-slow, running the specified jobs:

models: ["models/bart", "models/csm", "models/dia", "models/encoder_decoder", "models/musicgen", "models/rag", "models/reformer", "models/speech_encoder_decoder", "models/vision_encoder_decoder", "models/whisper"]
quantizations: []

github-actions · 2025-12-11T11:36:51Z

CI Results

Workflow Run ⚙️

✅ No failing test specific to this PR 🎉 !

zucchini-nlp · 2025-12-11T12:22:56Z

@vasqu requesting another review :)

vasqu

LGTM, just a few last nits

vasqu · 2025-12-11T13:54:13Z

+        generation_params = {}
+        default_config = self.__class__().to_dict() if not self.has_no_defaults_at_init else {}
+        for key in GenerationConfig._get_default_generation_params().keys():
+            if hasattr(self, key) and getattr(self, key) is not None and key not in default_config:
+                generation_params[key] = getattr(self, key)
+


Gotcha, makes sense to me

vasqu · 2025-12-11T13:58:12Z

-        elif self.num_beams == 1:
-            if self.do_sample is False:
+        elif self.num_beams is None or self.num_beams == 1:
+            if self.do_sample is not True:


Looks like it was missed here?

vasqu · 2025-12-11T14:00:58Z

+        # user-defined kwargs or `generation_config` > `self.generation_config` > global default values
+        # NOTE: doesn't make sense to allow kwargs and `generation_config`. Might be strict and make them mutually exclusive?


Fair enough but maybe we can break for v5? Not super important but it gives us a good opportunity to do so.

Either way, let's upgrade this to a TODO (as well).

zucchini-nlp · 2025-12-12T10:54:45Z

Will merge when CI is fixed on main

albertvillanova · 2025-12-15T12:03:41Z

Hi,

First of all, thanks for your reactivity and addressing the underlying issue:

Generation config merging logic prevents explicit override of model-specific defaults back to global defaults #42762

On the other hand, I did a quick pass over the generation code, and I think there is a subtle semantic point worth double-checking: None is not always just "unset", it can also mean "disable this behavior".

Concretely, top_k=None disables top-k filtering entirely and allows all tokens. Because of that, we now have two different concepts that can both be represented as "None":

Use the global/default value for top_k (e.g. 50)
Explicitly disable top-k filtering (top_k=None or 0) even if the model's generation config or the global defaults say otherwise

Therefore, a situation can occur during training where:

the user-provided generation_config has top_k=None (intending to disable top-k filtering),
the model’s own generation_config has a non-None value for top_k,
merging logic currently preserves the model’s value instead of respecting the explicit None.

In that scenario, None is not just "unset"; it is a meaningful instruction ("don't apply top-k filtering"). If so, the merge semantics may need refinement to avoid unintentionally re-enabling filtering (e.g. by using a sentinel value instead).

I'll continue digging into this, but flagging it early so we can discuss it before merging.

zucchini-nlp · 2025-12-15T12:41:27Z

@albertvillanova I think if top_k=None in the generation config it is the same as if users did not pass any top_k in kwargs. A value is not set to any value (None) does not specifically mean that the users is requesting to not use it, so the users would need to explicitly unset as top_k=0 if model has saved a different value.

Unfortunately we have no way to 100% know what users wants when they set values to None in current code. The only way would be for us to not update generation config with model's defaults if users pass my_generation_config. But that will be much more breaking and will require users to always create a custom config from model.generation_config

For ex, if everyone prepared custom configs as below, we can fix your issue. I'm afraid it's not the case for most users

my_generation_config = model.generation_config
my_generation_config.top_k = None
model.generate(inputs, generation_config=my_generation_config)

github-actions · 2025-12-17T14:15:19Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: bart, csm, dia, encoder_decoder, musicgen, rag, reformer, speech_encoder_decoder, vision_encoder_decoder, whisper

vasqu · 2025-12-17T16:51:16Z

I think we can merge? @zucchini-nlp

ebezzam · 2025-12-19T16:56:45Z

-                modified_values = {}
-                global_default_generation_config = GenerationConfig()
-                model_generation_config = self.generation_config
-                # we iterate over the model's generation config: it may hold custom keys, which we'll want to copy


@vasqu, @zucchini-nlp are custom keys no longer copied over?

Seems like only the ones here are defaulted at this line, which wouldn't copy custom keys in self.generation_config anymore

ebezzam · 2025-12-19T17:30:46Z

+        global_defaults = self.generation_config._get_default_generation_params()
+        generation_config.update(**self.generation_config.to_dict(), defaults_only=True)
+        generation_config.update(**global_defaults, defaults_only=True)
+


something like this just after?

# add custom keys not in global defaults for key, value in self.generation_config.to_dict().items(): if not hasattr(generation_config, key): setattr(generation_config, key, value)

* this way betetr maybe? * delete legacy from bart and mvp * import not found * fix some tests * fix more tests * revert smth to run tests again * i though I fixed it already, but there were more models * commit and check tests, clean-up later * assisted deocding shoudl work now * docs and whisper * fix a few more tests * no circular import errors pls * wording * add a test for defaults following TRL example * nit * Update src/transformers/configuration_utils.py Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com> * Update src/transformers/generation/candidate_generator.py Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com> * Update src/transformers/generation/utils.py Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com> * comments * final fix tests * more comments --------- Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

zucchini-nlp added 4 commits November 18, 2025 18:22

this way betetr maybe?

c325bb7

delete legacy from bart and mvp

da36ae1

merge main

060f957

import not found

6a6776d

zucchini-nlp added 5 commits December 8, 2025 16:10

fix some tests

dda23da

fix more tests

986271d

revert smth to run tests again

1b82af6

i though I fixed it already, but there were more models

03db18f

commit and check tests, clean-up later

95f6dea

zucchini-nlp mentioned this pull request Dec 10, 2025

Generation config merging logic prevents explicit override of model-specific defaults back to global defaults #42762

Closed

zucchini-nlp added 6 commits December 10, 2025 11:35

assisted deocding shoudl work now

c2cb44c

docs and whisper

c4c8e88

fix a few more tests

cebbb4d

no circular import errors pls

58fcdb0

wording

2842e84

add a test for defaults following TRL example

ce263f2