Fix MXFP4 quantizer validation to allow CPU inference with dequantize option by returnL · Pull Request #39953 · huggingface/transformers

returnL · 2025-08-06T09:35:54Z

What does this PR do?

This PR fixes a bug that prevented MXFP4 models from running on CPU when quantization_config.dequantize=True was set.

Problem

The validation logic in Mxfp4HfQuantizer checked CUDA availability before checking the dequantize flag, causing failures on CPU-only environments even when dequantization was enabled.

Solution

Reordered validation checks to prioritize dequantize configuration:

Check if dequantize is enabled - if yes, skip GPU validations
Only then check CUDA availability

Changes Made

Fix: Moved dequantize check before CUDA validation in quantizer_mxfp4.py
Tests: Added test cases to verify CPU inference with dequantize=True

Before submitting

Did you read the contributor guideline?
Did you write any new necessary tests?

Who can review?

@SunMarc @MekkCyber

Move dequantize check before CUDA availability check to allow CPU inference when quantization_config.dequantize is True. This enables users to run MXFP4 models on CPU by automatically converting them to BF16 format.

github-actions · 2025-08-06T09:36:58Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: mxfp4

MekkCyber

LGTM! thanks for fixing this and thanks for adding tests 🤗

SunMarc

Thanks !

HuggingFaceDocBuilderDev · 2025-08-06T13:14:20Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

… option (#39953) * Fix MXFP4 quantizer validation to enable CPU dequantization Move dequantize check before CUDA availability check to allow CPU inference when quantization_config.dequantize is True. This enables users to run MXFP4 models on CPU by automatically converting them to BF16 format. * Add tests for MXFP4 quantizer CPU dequantization validation * fix: format mxfp4 test file with ruff

returnL added 3 commits August 6, 2025 17:05

Fix MXFP4 quantizer validation to enable CPU dequantization

d053800

Move dequantize check before CUDA availability check to allow CPU inference when quantization_config.dequantize is True. This enables users to run MXFP4 models on CPU by automatically converting them to BF16 format.

Add tests for MXFP4 quantizer CPU dequantization validation

3fe64b1

Merge branch 'main' into fix/mxfp4-cpu-dequantize-validation

383a39e

fix: format mxfp4 test file with ruff

f441df7

MekkCyber approved these changes Aug 6, 2025

View reviewed changes

SunMarc approved these changes Aug 6, 2025

View reviewed changes

MekkCyber merged commit dd70a8c into huggingface:main Aug 6, 2025
24 checks passed

returnL deleted the fix/mxfp4-cpu-dequantize-validation branch August 6, 2025 16:19

SunMarc added the for patch Tag issues / labels that should be included in the next patch label Aug 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix MXFP4 quantizer validation to allow CPU inference with dequantize option#39953

Fix MXFP4 quantizer validation to allow CPU inference with dequantize option#39953
MekkCyber merged 4 commits intohuggingface:mainfrom
returnL:fix/mxfp4-cpu-dequantize-validation

returnL commented Aug 6, 2025

Uh oh!

github-actions Bot commented Aug 6, 2025

Uh oh!

MekkCyber left a comment

Uh oh!

SunMarc left a comment

Uh oh!

HuggingFaceDocBuilderDev commented Aug 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

returnL commented Aug 6, 2025

What does this PR do?

Problem

Solution

Changes Made

Before submitting

Who can review?

Uh oh!

github-actions Bot commented Aug 6, 2025

Uh oh!

MekkCyber left a comment

Choose a reason for hiding this comment

Uh oh!

SunMarc left a comment

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Aug 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants