[fix][acc] fix accuracy of fp8 attn weights model using ptpc quant recipe by gbyu-amd · Pull Request #670 · ROCm/ATOM

gbyu-amd · 2026-04-29T10:06:02Z

Motivation

The quark models, amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4 and amd/Kimi-K2-Thinking-MXFP4-AttnFP8 have fp8 weight linear layers in attn and adopt ptpc quant recipe. But current code in ATOM forces block scale quant in _fuse_rmsnorm_quant. This pr fixed this issue.

Technical Details

_fuse_rmsnorm_quant should select correct quant type based on the quant config/recipe. For per-token quant, a new kernel: fused_qk_rmsnorm_per_token_quant is added in aiter, refer to PR: ROCm/aiter#2958.

Test Plan

The gsm8k dataset accuracy is validated with/w.o this pr on amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4 and amd/Kimi-K2-Thinking-MXFP4-AttnFP8, with ATOM and vLLM-ATOM.

Test Result

Main branch:

amd/Kimi-K2-Thinking-MXFP4-AttnFP8

ATOM:

local-completions ({'model': '/workspace/shared/data/amd_int/models/Kimi-K2-Thinking-MXFP4-AttnFP8', 'base_url': 'http://localhost:8000/v1/completions', 'num_concurrent': 65, 'max_retries': 3, 'tokenized_requests': False}), gen_kwargs: ({}), limit: None, num_fewshot: 3, batch_size: 1
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     3|exact_match|↑  |0.8529|±  |0.0098|
|     |       |strict-match    |     3|exact_match|↑  |0.8514|±  |0.0098|

vLLM-ATOM:

local-completions ({'model': '/workspace/shared/data/amd_int/models/Kimi-K2-Thinking-MXFP4-AttnFP8', 'base_url': 'http://localhost:8000/v1/completions', 'num_concurrent': 65, 'max_retries': 3, 'tokenized_requests': False}), gen_kwargs: ({}), limit: None, num_fewshot: 3, batch_size: 1
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     3|exact_match|↑  |0.8491|±  |0.0099|
|     |       |strict-match    |     3|exact_match|↑  |0.8431|±  |0.0100|

Both ATOM and vLLM-ATOM drop to ~0.85, which is lower than expected.

amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4

vLLM-ATOM:

local-completions ({'model': '/workspace/shared/data/amd_int/models/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4', 'base_url': 'http://localhost:8000/v1/completions', 'num_concurrent': 65, 'max_retries': 3, 'tokenized_requests': False}), gen_kwargs: ({}), limit: None, num_fewshot: 3, batch_size: 1
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     3|exact_match|↑  |0.9393|±  |0.0066|
|     |       |strict-match    |     3|exact_match|↑  |0.9363|±  |0.0067|

This PR:

amd/Kimi-K2-Thinking-MXFP4-AttnFP8

ATOM:

local-completions ({'model': '/workspace/shared/data/amd_int/models/Kimi-K2-Thinking-MXFP4-AttnFP8', 'base_url': 'http://localhost:8000/v1/completions', 'num_concurrent': 65, 'max_retries': 3, 'tokenized_requests': False}), gen_kwargs: ({}), limit: None, num_fewshot: 3, batch_size: 1
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     3|exact_match|↑  |0.9333|±  |0.0069|
|     |       |strict-match    |     3|exact_match|↑  |0.9340|±  |0.0068|

vLLM-ATOM:

local-completions ({'model': '/workspace/shared/data/amd_int/models/Kimi-K2-Thinking-MXFP4-AttnFP8', 'base_url': 'http://localhost:8000/v1/completions', 'num_concurrent': 65, 'max_retries': 3, 'tokenized_requests': False}), gen_kwargs: ({}), limit: None, num_fewshot: 3, batch_size: 1
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     3|exact_match|↑  |0.9318|±  |0.0069|
|     |       |strict-match    |     3|exact_match|↑  |0.9287|±  |0.0071|

Both ATOM and vLLM-ATOM get the score recovered back to ~0.93.

amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4

vLLM-ATOM:

local-completions ({'model': '/workspace/shared/data/amd_int/models/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4', 'base_url': 'http://localhost:8000/v1/completions', 'num_concurrent': 65, 'max_retries': 3, 'tokenized_requests': False}), gen_kwargs: ({}), limit: None, num_fewshot: 3, batch_size: 1
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     3|exact_match|↑  |0.9409|±  |0.0065|
|     |       |strict-match    |     3|exact_match|↑  |0.9401|±  |0.0065|

For model amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4, although there is no obvious accuracy drop even without this pr, the code changes here still make sense and will not hurt the accuracy of this model.

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Guanbao Yu and others added 3 commits April 29, 2026 03:39

correct the quant type based on recipe for fuse_qknorm_quant

9c07e86

remove debug print

7199a2f

Merge branch 'main' into guanbao/fix_kimi_acc

feba99e

gbyu-amd requested review from valarLip, wuhuikx, zejunchen-zejun and zhuyuhua-v April 29, 2026 10:06

Merge branch 'main' into guanbao/fix_kimi_acc

1c07e86

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix][acc] fix accuracy of fp8 attn weights model using ptpc quant recipe#670

[fix][acc] fix accuracy of fp8 attn weights model using ptpc quant recipe#670
gbyu-amd wants to merge 4 commits intomainfrom
guanbao/fix_kimi_acc

gbyu-amd commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gbyu-amd commented Apr 29, 2026

Motivation

Technical Details

Test Plan

Test Result

Main branch:

amd/Kimi-K2-Thinking-MXFP4-AttnFP8

amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4

This PR:

amd/Kimi-K2-Thinking-MXFP4-AttnFP8

amd/DeepSeek-R1-0528-MXFP4-MTP-MoEFP4

Submission Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant