🗜 Hotfix: avoid passing quantization_config=None#4019
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Thanks for the PR: definitely, GPT OSS models should be supported in TRL!!!
That said, I'm not fully convinced we need to apply the "filter out all None kwargs" policy across the board. Most parameters in from_pretrained are robust to None: have you seen any issue with any parameter other than quantization_config?
In my view, there are two complementary points here:
- Upstream issue in Transformers: do you know why transformers treat differently missing and
Nonequantization_config? I think this might be a bug. In my opinion, passingNoneshould be treated the same as omitting the key altogether, not as an invalid quantization config. It would be worth raising or linking an issue upstream so we can align behavior there.- I can have a look at this.
- Targeted fix in TRL: While investigating and waiting for an upstream fix, it makes sense for TRL to guard specifically against this problem.
What about stripping only the quantization_config (and associated device_map) if it is None. This would keep the fix minimal, and avoid unnecessarily rewriting the kwargs for unrelated arguments. What do you think?
|
yeah good point.
no
I think it comes from these lines, probably a bug, but I haven't had time to look into it further.
yep, agree, done I will merge this PR now, as I'd like to include it in the release, but we should definitely do this
|
quantization_config=None
passing
isn't the same as
causing gpt oss model to fail loading when used with trl cli