[AMD][ROCM] Add MI355X Config: glm5-fp4-mi355x-sglang-mtp by ChangLiu0709 · Pull Request #1254 · SemiAnalysisAI/InferenceX

ChangLiu0709 · 2026-05-01T13:51:37Z

Summary

Adds glm5-fp4-mi355x-sglang-mtp config to .github/configs/amd-master.yaml and a new benchmarks/single_node/glm5_fp4_mi355x_mtp.sh launch script.
Image: lmsysorg/sglang-rocm:v0.5.10.post1-rocm700-mi35x-20260428 — ships transformers with glm_moe_dsa support, so no pip install -U transformers is needed (unlike glm5-fp8-mi355x-sglang).
Model: amd/GLM-5-MXFP4.
Launch flags: --trust-remote-code, --tp $TP, --chunked-prefill-size 131072, --disable-radix-cache, --mem-fraction-static 0.85, --model-loader-extra-config '{"enable_multithread_load": true}', --watchdog-timeout 1200, --reasoning-parser glm45, --tool-call-parser glm47, plus EAGLE spec-decoding (--speculative-algorithm EAGLE --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4) behind SGLANG_ENABLE_SPEC_V2=1.
Client passes --use-chat-template per AGENTS.md for MTP.
Search-space: { tp: 8, conc-start: 4, conc-end: 64, spec-decoding: mtp } for 1k1k and 8k1k.
perf-changelog.yaml diff is append-only.

Dependency

This PR depends on #1253 ([AMD][ROCM] Fix benchmark_serving Rust Tokenizer Crash via Direct transformers AutoTokenizer), which must be merged first.

The previous attempt at this config (PR #1091) was marked [SGLang broken] and closed without merging. The root cause was benchmark_serving.py importing get_tokenizer from vllm, which internally calls get_cached_tokenizer() and accesses tokenizer.all_special_tokens_extended — an attribute not present on the Rust-backed TokenizersBackend used by GLM-5. PR #1253 fixes this by replacing the vllm import with a direct transformers.AutoTokenizer.from_pretrained() call, unblocking this config.

Test plan

YAML parses for both master config and perf-changelog.
bash -n benchmarks/single_node/glm5_fp4_mi355x_mtp.sh — bash syntax OK.
git diff perf-changelog.yaml shows only additions.
Benchmark verified working end-to-end on lmsysorg/sglang-rocm:v0.5.10rc0-rocm700-mi35x-20260422 and v0.5.10.post1-rocm700-mi35x-20260428 with amd/GLM-5-MXFP4, TP=8, MI355X (after applying PR [AMD][ROCM] Fix benchmark_serving Rust Tokenizer Crash via Direct transformers AutoTokenizer #1253 tokenizer fix).
CI sweep passes on MI355X.

🤖 Generated with Claude Code

github-actions · 2026-05-01T13:51:46Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

- Add `glm5-fp4-mi355x-sglang-mtp` config to amd-master.yaml. - Add benchmarks/single_node/glm5_fp4_mi355x_mtp.sh launch script. - Image: lmsysorg/sglang-rocm:v0.5.10.post1-rocm700-mi35x-20260428 - Model: amd/GLM-5-MXFP4 (TP=8, FP4/quark quantization) - EAGLE MTP speculative decoding: num-steps=3, eagle-topk=1, num-draft-tokens=4, behind SGLANG_ENABLE_SPEC_V2=1 - Search space: 1k1k and 8k1k, conc 4-64, spec-decoding=mtp - Append perf-changelog.yaml entry. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-project-automation Bot added this to InferenceMAX Board May 1, 2026

ChangLiu0709 force-pushed the chang/add-glm5-fp4-mi355x-sglang-mtp branch from a40a31b to e8ffa83 Compare May 1, 2026 13:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMD][ROCM] Add MI355X Config: glm5-fp4-mi355x-sglang-mtp#1254

[AMD][ROCM] Add MI355X Config: glm5-fp4-mi355x-sglang-mtp#1254
ChangLiu0709 wants to merge 1 commit intomainfrom
chang/add-glm5-fp4-mi355x-sglang-mtp

ChangLiu0709 commented May 1, 2026

Uh oh!

github-actions Bot commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ChangLiu0709 commented May 1, 2026

Summary

Dependency

Test plan

Uh oh!

github-actions Bot commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant