Skip to content

🗜 Hotfix: avoid passing quantization_config=None#4019

Merged
qgallouedec merged 6 commits intomainfrom
fix-passing-model-kwargs
Sep 9, 2025
Merged

🗜 Hotfix: avoid passing quantization_config=None#4019
qgallouedec merged 6 commits intomainfrom
fix-passing-model-kwargs

Conversation

@qgallouedec
Copy link
Copy Markdown
Member

passing

model = AutoModelForCausalLM.from_pretrained("my_model", quantization_config=None)

isn't the same as

model = AutoModelForCausalLM.from_pretrained("my_model")

causing gpt oss model to fail loading when used with trl cli

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Copy Markdown
Member

@albertvillanova albertvillanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR: definitely, GPT OSS models should be supported in TRL!!!

That said, I'm not fully convinced we need to apply the "filter out all None kwargs" policy across the board. Most parameters in from_pretrained are robust to None: have you seen any issue with any parameter other than quantization_config?

In my view, there are two complementary points here:

  1. Upstream issue in Transformers: do you know why transformers treat differently missing and None quantization_config? I think this might be a bug. In my opinion, passing None should be treated the same as omitting the key altogether, not as an invalid quantization config. It would be worth raising or linking an issue upstream so we can align behavior there.
    • I can have a look at this.
  2. Targeted fix in TRL: While investigating and waiting for an upstream fix, it makes sense for TRL to guard specifically against this problem.

What about stripping only the quantization_config (and associated device_map) if it is None. This would keep the fix minimal, and avoid unnecessarily rewriting the kwargs for unrelated arguments. What do you think?

Comment thread trl/trainer/gkd_trainer.py
Comment thread trl/trainer/online_dpo_trainer.py
@qgallouedec
Copy link
Copy Markdown
Member Author

yeah good point.

have you seen any issue with any parameter other than quantization_config?

no

do you know why transformers treat differently missing and None quantization_config?

I think it comes from these lines, probably a bug, but I haven't had time to look into it further.

https://github.com/huggingface/transformers/blob/37c14430c99edca79dfcdcb76f1209f291b12fab/src/transformers/configuration_utils.py#L962-L967

Targeted fix in TRL: While investigating and waiting for an upstream fix, it makes sense for TRL to guard specifically against this problem.

yep, agree, done


I will merge this PR now, as I'd like to include it in the release, but we should definitely do this

It would be worth raising or linking an issue upstream so we can align behavior there.

@qgallouedec qgallouedec changed the title Fix passing model kwargs 🗜 Hotfix: avoid passing quantization_config=None Sep 9, 2025
@qgallouedec qgallouedec merged commit a647e5a into main Sep 9, 2025
9 of 11 checks passed
@qgallouedec qgallouedec deleted the fix-passing-model-kwargs branch September 9, 2025 20:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants