Skip to content

Fix: propagate quantization_config to text sub-config for composite models in AutoModelForCausalLM#45494

Merged
SunMarc merged 2 commits intohuggingface:mainfrom
lvliang-intel:fix/propagate-quantization-config-to-text-subconfig
Apr 21, 2026
Merged

Fix: propagate quantization_config to text sub-config for composite models in AutoModelForCausalLM#45494
SunMarc merged 2 commits intohuggingface:mainfrom
lvliang-intel:fix/propagate-quantization-config-to-text-subconfig

Conversation

@lvliang-intel
Copy link
Copy Markdown
Contributor

What does this PR do?

Fixes loading of quantized composite models (e.g. Qwen3.5-35B-A3B with AutoRound quantization) via AutoModelForCausalLM.from_pretrained.

Problem:
For composite models whose model_class.config_class maps to text_config, the from_pretrained method in auto_factory.py swaps the full composite config with the text sub-config:

if model_class.config_class == config.sub_configs.get("text_config", None):
    config = config.get_text_config()

The quantization_config is stored on the top-level composite config (from config.json), but not on the text sub-config. When modeling_utils.py later calls get_hf_quantizer, it checks hasattr(config, "quantization_config") to determine pre_quantized. Since the text sub-config lacks this attribute, pre_quantized is set to False, which causes a ValueError for quantization methods that require pre-quantized weights (e.g. AutoRound, GPTQ, AWQ).

ValueError: The quantization method QuantizationMethod.AUTOROUND does require the model to be pre-quantized.
You explicitly passed pre_quantized=False meaning your model weights are not quantized.

How to reproduce the issue:

qconfig = transformers.AutoRoundConfig()
model = transformers.AutoModelForCausalLM.from_pretrained(model_name_or_path="Intel/Qwen3.5-35B-A3B-int4-AutoRound", trust_remote_code=True, device_map="cuda", quantization_config=qconfig)

Fixes # (issue)

After swapping to the text sub-config, propagate quantization_config from the parent composite config if it exists and the text sub-config does not already have one.

@Rocketknight1
Copy link
Copy Markdown
Member

cc @SunMarc for quants

Copy link
Copy Markdown
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot ! Left a suggestion and a comment

Comment thread src/transformers/models/auto/auto_factory.py
…odels in AutoModelForCausalLM

Signed-off-by: lvliang-intel <liang1.lv@intel.com>
Signed-off-by: lvliang-intel <liang1.lv@intel.com>
@lvliang-intel lvliang-intel force-pushed the fix/propagate-quantization-config-to-text-subconfig branch from aaf3b31 to c9981a2 Compare April 21, 2026 02:04
@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto

@SunMarc SunMarc enabled auto-merge April 21, 2026 12:21
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Copy Markdown
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice !

@SunMarc SunMarc added this pull request to the merge queue Apr 21, 2026
Merged via the queue into huggingface:main with commit 85099df Apr 21, 2026
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants