Skip to content

Fix fp8 UT bugs#42442

Closed
YangKai0616 wants to merge 4 commits intohuggingface:mainfrom
YangKai0616:main
Closed

Fix fp8 UT bugs#42442
YangKai0616 wants to merge 4 commits intohuggingface:mainfrom
YangKai0616:main

Conversation

@YangKai0616
Copy link
Copy Markdown
Contributor

What does this PR do?

When running the tests with the command pytest -rA tests/quantization/finegrained_fp8/test_fp8.py, the current output reports the following errors:

FAILED tests/quantization/finegrained_fp8/test_fp8.py::FP8QuantizerTest::test_quantized_model - AssertionError: 'Once upon a time, there was a man who was a farmer.' != 'Once upon a time, there was a man who was very rich.'
FAILED tests/quantization/finegrained_fp8/test_fp8.py::FP8QuantizerTest::test_quantized_model_conversion - AssertionError: 122 != 146
FAILED tests/quantization/finegrained_fp8/test_fp8.py::FP8QuantizerTest::test_quantized_model_multi_accelerator - AssertionError: 'Once upon a time, there was a man who was a farmer.' != 'Once upon a time, there was a man who was very rich.'
FAILED tests/quantization/finegrained_fp8/test_fp8.py::FP8QuantizerTest::test_save_pretrained - AssertionError: 'Once upon a time, there was a man who was a farmer.' != 'Once upon a time, there was a man who was very rich.'
FAILED tests/quantization/finegrained_fp8/test_fp8.py::FP8QuantizerTest::test_save_pretrained_offload - AssertionError: 'Once upon a time, there was a man who was a farmer.' != 'Once upon a time, there was a man who was very rich.'

The errors can be categorized into two types:

  1. Model output does not match EXPECTED_OUTPUT

Root cause: The _dtype used meta model parameter dtype in here, while the EXPECTED_OUTPUT appears to be the result obtained with a fixed dtype of torch.float32. After I explicitly set _dtype to torch.float32, the result matches the expected output.

This PR updates the EXPECTED_OUTPUT accordingly. Please let me know your insights, thx!

  1. nb_linears - 25 ≠ nb_fp8_linear

Root cause: The replacement operation in here can lead to patterns like XX.fc1.XX → XX.fc*.XX, in which case the layer name fc1 listed in modules_to_not_convert fails to match the actual layer.

This PR fixes the string matching logic.

@YangKai0616
Copy link
Copy Markdown
Contributor Author

@SunMarc , @ydshieh , please help review, thanks!

Copy link
Copy Markdown
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks ! Left a comment


current_key_name_str = re.sub(r"\d+", "*", current_key_name)
if not any(key in current_key_name_str for key in (modules_to_not_convert or [])):
if not any(key in current_key_name_parts for key in (modules_to_not_convert or [])):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can use the same logic as mxfp4 ? We have the following function should_convert_module. Indeed here the logic here changed recently and this needs to be fixed

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@github-actions
Copy link
Copy Markdown
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: finegrained_fp8

@SunMarc
Copy link
Copy Markdown
Member

SunMarc commented Nov 27, 2025

This is a bit more complex to fix those due to the refactor, so I decided to open a PR to fix those. I've added you as a co-author. Thanks for running those tests ;)

@YangKai0616
Copy link
Copy Markdown
Contributor Author

Alright, I will follow the changes in the new PR and close this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants