Skip to content

[AMD][ROCM] Add MI355X Config: glm5-fp4-mi355x-sglang-mtp#1254

Draft
ChangLiu0709 wants to merge 1 commit intomainfrom
chang/add-glm5-fp4-mi355x-sglang-mtp
Draft

[AMD][ROCM] Add MI355X Config: glm5-fp4-mi355x-sglang-mtp#1254
ChangLiu0709 wants to merge 1 commit intomainfrom
chang/add-glm5-fp4-mi355x-sglang-mtp

Conversation

@ChangLiu0709
Copy link
Copy Markdown
Collaborator

Summary

  • Adds glm5-fp4-mi355x-sglang-mtp config to .github/configs/amd-master.yaml and a new benchmarks/single_node/glm5_fp4_mi355x_mtp.sh launch script.
  • Image: lmsysorg/sglang-rocm:v0.5.10.post1-rocm700-mi35x-20260428 — ships transformers with glm_moe_dsa support, so no pip install -U transformers is needed (unlike glm5-fp8-mi355x-sglang).
  • Model: amd/GLM-5-MXFP4.
  • Launch flags: --trust-remote-code, --tp $TP, --chunked-prefill-size 131072, --disable-radix-cache, --mem-fraction-static 0.85, --model-loader-extra-config '{"enable_multithread_load": true}', --watchdog-timeout 1200, --reasoning-parser glm45, --tool-call-parser glm47, plus EAGLE spec-decoding (--speculative-algorithm EAGLE --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4) behind SGLANG_ENABLE_SPEC_V2=1.
  • Client passes --use-chat-template per AGENTS.md for MTP.
  • Search-space: { tp: 8, conc-start: 4, conc-end: 64, spec-decoding: mtp } for 1k1k and 8k1k.
  • perf-changelog.yaml diff is append-only.

Dependency

This PR depends on #1253 ([AMD][ROCM] Fix benchmark_serving Rust Tokenizer Crash via Direct transformers AutoTokenizer), which must be merged first.

The previous attempt at this config (PR #1091) was marked [SGLang broken] and closed without merging. The root cause was benchmark_serving.py importing get_tokenizer from vllm, which internally calls get_cached_tokenizer() and accesses tokenizer.all_special_tokens_extended — an attribute not present on the Rust-backed TokenizersBackend used by GLM-5. PR #1253 fixes this by replacing the vllm import with a direct transformers.AutoTokenizer.from_pretrained() call, unblocking this config.

Test plan

  • YAML parses for both master config and perf-changelog.
  • bash -n benchmarks/single_node/glm5_fp4_mi355x_mtp.sh — bash syntax OK.
  • git diff perf-changelog.yaml shows only additions.
  • Benchmark verified working end-to-end on lmsysorg/sglang-rocm:v0.5.10rc0-rocm700-mi35x-20260422 and v0.5.10.post1-rocm700-mi35x-20260428 with amd/GLM-5-MXFP4, TP=8, MI355X (after applying PR [AMD][ROCM] Fix benchmark_serving Rust Tokenizer Crash via Direct transformers AutoTokenizer #1253 tokenizer fix).
  • CI sweep passes on MI355X.

🤖 Generated with Claude Code

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 1, 2026

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

- Add `glm5-fp4-mi355x-sglang-mtp` config to amd-master.yaml.
- Add benchmarks/single_node/glm5_fp4_mi355x_mtp.sh launch script.
- Image: lmsysorg/sglang-rocm:v0.5.10.post1-rocm700-mi35x-20260428
- Model: amd/GLM-5-MXFP4 (TP=8, FP4/quark quantization)
- EAGLE MTP speculative decoding: num-steps=3, eagle-topk=1,
  num-draft-tokens=4, behind SGLANG_ENABLE_SPEC_V2=1
- Search space: 1k1k and 8k1k, conc 4-64, spec-decoding=mtp
- Append perf-changelog.yaml entry.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@ChangLiu0709 ChangLiu0709 force-pushed the chang/add-glm5-fp4-mi355x-sglang-mtp branch from a40a31b to e8ffa83 Compare May 1, 2026 13:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant