Skip to content

Fix MXFP4 quantizer validation to allow CPU inference with dequantize option#39953

Merged
MekkCyber merged 4 commits intohuggingface:mainfrom
returnL:fix/mxfp4-cpu-dequantize-validation
Aug 6, 2025
Merged

Fix MXFP4 quantizer validation to allow CPU inference with dequantize option#39953
MekkCyber merged 4 commits intohuggingface:mainfrom
returnL:fix/mxfp4-cpu-dequantize-validation

Conversation

@returnL
Copy link
Copy Markdown
Contributor

@returnL returnL commented Aug 6, 2025

What does this PR do?

This PR fixes a bug that prevented MXFP4 models from running on CPU when quantization_config.dequantize=True was set.

Problem

The validation logic in Mxfp4HfQuantizer checked CUDA availability before checking the dequantize flag, causing failures on CPU-only environments even when dequantization was enabled.

Solution

Reordered validation checks to prioritize dequantize configuration:

  1. Check if dequantize is enabled - if yes, skip GPU validations
  2. Only then check CUDA availability

Changes Made

  • Fix: Moved dequantize check before CUDA validation in quantizer_mxfp4.py
  • Tests: Added test cases to verify CPU inference with dequantize=True

Before submitting

  • Did you read the contributor guideline?
  • Did you write any new necessary tests?

Who can review?

@SunMarc @MekkCyber

returnL added 3 commits August 6, 2025 17:05
Move dequantize check before CUDA availability check to allow
CPU inference when quantization_config.dequantize is True.
This enables users to run MXFP4 models on CPU by automatically
converting them to BF16 format.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Aug 6, 2025

[For maintainers] Suggested jobs to run (before merge)

run-slow: mxfp4

Copy link
Copy Markdown
Contributor

@MekkCyber MekkCyber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! thanks for fixing this and thanks for adding tests 🤗

Copy link
Copy Markdown
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks !

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@MekkCyber MekkCyber merged commit dd70a8c into huggingface:main Aug 6, 2025
24 checks passed
@returnL returnL deleted the fix/mxfp4-cpu-dequantize-validation branch August 6, 2025 16:19
@SunMarc SunMarc added the for patch Tag issues / labels that should be included in the next patch label Aug 6, 2025
ArthurZucker pushed a commit that referenced this pull request Aug 13, 2025
… option (#39953)

* Fix MXFP4 quantizer validation to enable CPU dequantization

Move dequantize check before CUDA availability check to allow
CPU inference when quantization_config.dequantize is True.
This enables users to run MXFP4 models on CPU by automatically
converting them to BF16 format.

* Add tests for MXFP4 quantizer CPU dequantization validation

* fix: format mxfp4 test file with ruff
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

for patch Tag issues / labels that should be included in the next patch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants