fix: add identity reverse_op to dequantize ops for save_pretrained#44983
Conversation
1742755 to
9c59dda
Compare
|
cc @SunMarc for quantization maybe? |
hmmm, this shouldn't trigger a reverse ops when we dequantized the model. I think the right behavior here would be to just save the model in its dequantized form. |
9c59dda to
b676da0
Compare
|
@SunMarc Thanks for the review! Updated the PR based on your feedback:
|
|
[For maintainers] Suggested jobs to run (before merge) run-slow: mxfp4 |
cd2c8fe to
13f9355
Compare
SunMarc
left a comment
There was a problem hiding this comment.
left a few question, thanks for iterating
| if self.quantization_config.dequantize and hasattr(model.config, "quantization_config"): | ||
| del model.config.quantization_config |
There was a problem hiding this comment.
when calling dequantize, i think there is a function that is triggered to remove all traces of quantization no ? maybe we don't need to do this
There was a problem hiding this comment.
You're right. remove_quantization_config in base.py already handles this during postprocess_model. Removed the mxfp4-specific deletion.
| @property | ||
| def reverse_op(self) -> ConversionOps: | ||
| return Mxfp4IdentityOp() |
There was a problem hiding this comment.
even after dequantizing, do we still need the reverse ops ? Can you check the behavior with fp8 when dequantize is called also ?
There was a problem hiding this comment.
same issue exists. Tested with Qwen/Qwen3-0.6B-FP8 using FineGrainedFP8Config(dequantize=True): save_pretrained also raises NotImplementedError because _weight_conversions remains after remove_quantization_config.
So the fix is now in base.py, remove_quantization_config clears _weight_conversions alongside hf_quantizer and quantization_config. This makes the mxfp4-specific reverse_op and config removal unnecessary, so I've removed them. The fix covers both mxfp4 and fp8.
13f9355 to
ae3e643
Compare
|
Updated based on review feedback. The fix is now a 2-line change: clearing Verified with both affected quantizers: MXFP4 (
FP8 (
|
ae3e643 to
9553783
Compare
| if hasattr(model, "_weight_conversions"): | ||
| del model._weight_conversions |
There was a problem hiding this comment.
hmmm but still we shouldn't just remove all weight_conversions. not all weight_conversions are related to quantization. Can we find a way to remove only the weight conversion that makes sense ? otherwise, we can just update the weight conversion ops if it is simpler
There was a problem hiding this comment.
Didn't realize not all weight_conversions are quantization-related thanks for catching that.
Between the two options, adding identity reverse_op on each dequantize op seemed simpler, so went with that. Added _IdentityOp to core_model_loading.py and set it as reverse_op on Mxfp4Dequantize, Fp8Dequantize, and MetalDequantize (all three had the same issue).
Tested save + reload on:
- MXFP4:
openai/gpt-oss-20b - FP8:
Qwen/Qwen3-0.6B-FP8 - Metal:
medmekk/Llama-3.2-1B-Instruct-metal
de90e5b to
ece5d90
Compare
ece5d90 to
61347c0
Compare
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
…ined Dequantize operations (Mxfp4Dequantize, Fp8Dequantize, MetalDequantize) raise NotImplementedError on reverse_op, causing save_pretrained to fail for models loaded with dequantize=True. Add _IdentityOp as the reverse_op so dequantized weights are saved as-is.
0d756c1 to
4212234
Compare
…uggingface#44983) fix: add identity reverse_op to dequantize operations for save_pretrained Dequantize operations (Mxfp4Dequantize, Fp8Dequantize, MetalDequantize) raise NotImplementedError on reverse_op, causing save_pretrained to fail for models loaded with dequantize=True. Add _IdentityOp as the reverse_op so dequantized weights are saved as-is.
The _IdentityOp class (added by PR #44983) was accidentally deleted during the MoE expert parallelism work. It is needed by finegrained_fp8.py and metal_quantization.py as a pass-through reverse_op for dequantize operations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* MoE expert parallelism + sequence parallelism - Add PackedColwiseParallel for fused gate_up_proj weights - Add MoEExpertsParallel with per-expert DTensor sharding - Add PrepareModuleInputOutput for SP allgather/split hooks - Add _AllReduceBackward for MoE routing weight gradients - Extend TPStyle with moe_experts, packed_colwise, activation, module kinds - _StridedShard handling in core_model_loading for interleaved weights - MoE model configs: mixtral, deepseek_v3, qwen3 with SP plans - DTensor rotary_pos_emb guard for mixtral * Fix ruff linting and formatting * Fix ruff formatting in core_model_loading.py * Restore _IdentityOp accidentally removed in 25a1f48 The _IdentityOp class (added by PR #44983) was accidentally deleted during the MoE expert parallelism work. It is needed by finegrained_fp8.py and metal_quantization.py as a pass-through reverse_op for dequantize operations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Backport new TP/FSDP API + fix DTensor imports in Copied-from models * from_pretrained orchestration + distributed save/load (#45409) * from_pretrained orchestration + save/load - Add gather_full_state_dict() for DTensor→full tensor saving - Add convert_strided_to_shard() / restore_strided_from_shard() for DCP - Add _redistribute_dtensor() helper - Full distributed_config integration in from_pretrained/save_pretrained - Rename apply_fsdp2 → apply_fully_shard_data_parallel - save_optimizer() / load_optimizer() in distributed/utils - Trainer integration with distributed_config - Updated FSDP and TP tests for new orchestration API - DTensor shard-on-read test updates * revert distributed utils * eaaea * all tests for core modeling are passing * populate import from init for tp * ruff * ruff --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
What does this PR do?
Fixes
save_pretrained()for models loaded withdequantize=True.save_pretrainedcallsreverse_opon all weight conversion operations from loading. Dequantize ops (Mxfp4Dequantize,Fp8Dequantize,MetalDequantize) don't havereverse_opimplemented, so it raisesNotImplementedError.Since dequantized weights are already in their target dtype, the reverse op should just pass them through. Added
_IdentityOpincore_model_loading.pyand set it asreverse_opon all three dequantize operations.Tested with
openai/gpt-oss-20b+Mxfp4Config(dequantize=True)Qwen/Qwen3-0.6B-FP8+FineGrainedFP8Config(dequantize=True)medmekk/Llama-3.2-1B-Instruct-metal+MetalConfig(dequantize=True)All three: save ✓, no
quantization_configin saved config.json ✓, reload ✓ (0 meta params)