[vllm-atom] Fix GLM-5 accuracy in vLLM plugin by kliuae-amd · Pull Request #669 · ROCm/ATOM

kliuae-amd · 2026-04-29T08:42:36Z

Motivation

This PR fixes the accuracy drop of GLM-5 in vLLM-ATOM mode.

Technical Details

Build ragged layout in every layer.
Use top_k_per_row_prefill from vLLM that handles indexing more consistently across short/long context lengths.
In combination with bugfix for cp_gather_indexer_k_quant_cache in aiter ROCm/aiter#2954, accuracy is restored to baseline.

Test Plan

Accuracy test with lm_eval

Model: zai-org/GLM-5-FP8

Server command

ATOM_DISABLE_VLLM_PLUGIN=0 \
ATOM_DISABLE_VLLM_PLUGIN_ATTENTION=0 \
vllm serve zai-org/GLM-5-FP8 \
  -tp 8 \
  --gpu-memory-utilization 0.7 \
  --no-enable-prefix-caching \
  --disable-uvicorn-access-log \
  --trust-remote-code \
  --compilation-config '{"cudagraph_mode": "FULL_AND_PIECEWISE"}' \
  --kv-cache-dtype fp8 \
  --block-size 1

lm_eval command:

lm_eval --model local-completions   --model_args model=zai-org/GLM-5-FP8,base_url=http://localhost:8000/v1/completions,num_concurrent=64,tokenized_requests=False  --tasks gsm8k --num_fewshot 20

Test Result

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	20	exact_match	_	0.9431	_	0.0064
		strict-match	20	exact_match	_	0.9431	_	0.0064

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>

fix sparse mla

8eafe7c

Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[vllm-atom] Fix GLM-5 accuracy in vLLM plugin#669

[vllm-atom] Fix GLM-5 accuracy in vLLM plugin#669
kliuae-amd wants to merge 1 commit intomainfrom
kliuae/plugin_fix_dsa

kliuae-amd commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kliuae-amd commented Apr 29, 2026

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants