fix(quantization): Skip weight initialization for quantized models by mauricioharley · Pull Request #41273 · huggingface/transformers

mauricioharley · 2025-10-02T00:23:25Z

Description

This Pull Request fixes a RuntimeError that occurs when loading llmcompressor W8A8 quantized models (e.g., RedHatAI/Qwen2.5-VL-7B-Instruct-quantized.w8a8) due to an attempt to initialize int8 weights using torch.nn.init.normal_(), which only supports floating-point dtypes.

The issue was identified in modeling_utils.py within the _initialize_missing_keys method. When is_quantized is True, the else branch was still calling self.initialize_weights(), leading to the RuntimeError.

Proposed Change

Added a conditional check if not is_quantized: before the call to self.initialize_weights() in the else branch of the _initialize_missing_keys method. This ensures that weight initialization is skipped for quantized models, as their weights are either already defined or will be loaded from a pretrained state dictionary, making the initialization redundant and problematic.

Related Issue

Closes #39366

ydshieh · 2025-10-02T08:37:52Z

cc @MekkCyber @SunMarc

SunMarc · 2025-10-02T10:25:12Z

+            if not is_quantized:
+                self.initialize_weights()


quantized models shouldn't be handled differently. Maybe we need to set _is_hf_initialized somewhere in the modules that are impacted.

Thanks for the catch, @SunMarc! I restored the regular initialize_weights() path so quantized models no longer skip it, and added a hook in the quantizers to mark the tensors we create as _is_hf_initialized. That way the initialization logic still runs, but the quantized parameters stay untouched.

This commit addresses the RuntimeError encountered when loading llmcompressor W8A8 quantized models, where `torch.nn.init.normal_()` is called on `int8` tensors during weight initialization. Fixes huggingface#39366 Signed-off-by: Mauricio Harley <mauricioharley@gmail.com>

github-actions · 2025-10-02T12:25:59Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: fbgemm_fp8, finegrained_fp8, higgs, hqq, mxfp4

ArthurZucker

hey! Sorry for the radio silence, if you still want to work on this we merged #41580 which changes this a lot!

SunMarc reviewed Oct 2, 2025

View reviewed changes

mauricioharley force-pushed the address_issue_39366 branch from d006d84 to e6d835b Compare October 2, 2025 11:36

mauricioharley force-pushed the address_issue_39366 branch from e6d835b to 1f0ce19 Compare October 2, 2025 12:25

ArthurZucker reviewed Nov 14, 2025

View reviewed changes

This was referenced Apr 29, 2026

Cumulative feature and defect updates from recent Transformers PRs evalstate/transformers#42

Open

Cumulative defect fixes from recent Transformers PRs evalstate/transformers#43

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(quantization): Skip weight initialization for quantized models#41273

fix(quantization): Skip weight initialization for quantized models#41273
mauricioharley wants to merge 1 commit intohuggingface:mainfrom
mauricioharley:address_issue_39366

mauricioharley commented Oct 2, 2025

Uh oh!

ydshieh commented Oct 2, 2025

Uh oh!

SunMarc Oct 2, 2025

Uh oh!

mauricioharley Oct 2, 2025

Uh oh!

github-actions Bot commented Oct 2, 2025

Uh oh!

ArthurZucker left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

mauricioharley commented Oct 2, 2025

Description

Proposed Change

Related Issue

Uh oh!

ydshieh commented Oct 2, 2025

Uh oh!

SunMarc Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

mauricioharley Oct 2, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Oct 2, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants