fix(quantization): Skip weight initialization for quantized models#41273
fix(quantization): Skip weight initialization for quantized models#41273mauricioharley wants to merge 1 commit intohuggingface:mainfrom
Conversation
| if not is_quantized: | ||
| self.initialize_weights() |
There was a problem hiding this comment.
quantized models shouldn't be handled differently. Maybe we need to set _is_hf_initialized somewhere in the modules that are impacted.
There was a problem hiding this comment.
Thanks for the catch, @SunMarc! I restored the regular initialize_weights() path so quantized models no longer skip it, and added a hook in the quantizers to mark the tensors we create as _is_hf_initialized. That way the initialization logic still runs, but the quantized parameters stay untouched.
d006d84 to
e6d835b
Compare
This commit addresses the RuntimeError encountered when loading llmcompressor W8A8 quantized models, where `torch.nn.init.normal_()` is called on `int8` tensors during weight initialization. Fixes huggingface#39366 Signed-off-by: Mauricio Harley <mauricioharley@gmail.com>
e6d835b to
1f0ce19
Compare
|
[For maintainers] Suggested jobs to run (before merge) run-slow: fbgemm_fp8, finegrained_fp8, higgs, hqq, mxfp4 |
ArthurZucker
left a comment
There was a problem hiding this comment.
hey! Sorry for the radio silence, if you still want to work on this we merged #41580 which changes this a lot!
Description
This Pull Request fixes a
RuntimeErrorthat occurs when loadingllmcompressorW8A8 quantized models (e.g.,RedHatAI/Qwen2.5-VL-7B-Instruct-quantized.w8a8) due to an attempt to initializeint8weights usingtorch.nn.init.normal_(), which only supports floating-point dtypes.The issue was identified in
modeling_utils.pywithin the_initialize_missing_keysmethod. Whenis_quantizedisTrue, theelsebranch was still callingself.initialize_weights(), leading to theRuntimeError.Proposed Change
Added a conditional check
if not is_quantized:before the call toself.initialize_weights()in theelsebranch of the_initialize_missing_keysmethod. This ensures that weight initialization is skipped for quantized models, as their weights are either already defined or will be loaded from a pretrained state dictionary, making the initialization redundant and problematic.Related Issue
Closes #39366