[AMD][ROCM] Add MI355X Config: glm5-fp4-mi355x-sglang-mtp#1254
[AMD][ROCM] Add MI355X Config: glm5-fp4-mi355x-sglang-mtp#1254ChangLiu0709 wants to merge 1 commit intomainfrom
Conversation
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
- Add `glm5-fp4-mi355x-sglang-mtp` config to amd-master.yaml. - Add benchmarks/single_node/glm5_fp4_mi355x_mtp.sh launch script. - Image: lmsysorg/sglang-rocm:v0.5.10.post1-rocm700-mi35x-20260428 - Model: amd/GLM-5-MXFP4 (TP=8, FP4/quark quantization) - EAGLE MTP speculative decoding: num-steps=3, eagle-topk=1, num-draft-tokens=4, behind SGLANG_ENABLE_SPEC_V2=1 - Search space: 1k1k and 8k1k, conc 4-64, spec-decoding=mtp - Append perf-changelog.yaml entry. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
a40a31b to
e8ffa83
Compare
Summary
glm5-fp4-mi355x-sglang-mtpconfig to.github/configs/amd-master.yamland a newbenchmarks/single_node/glm5_fp4_mi355x_mtp.shlaunch script.lmsysorg/sglang-rocm:v0.5.10.post1-rocm700-mi35x-20260428— ships transformers withglm_moe_dsasupport, so nopip install -U transformersis needed (unlikeglm5-fp8-mi355x-sglang).amd/GLM-5-MXFP4.--trust-remote-code,--tp $TP,--chunked-prefill-size 131072,--disable-radix-cache,--mem-fraction-static 0.85,--model-loader-extra-config '{"enable_multithread_load": true}',--watchdog-timeout 1200,--reasoning-parser glm45,--tool-call-parser glm47, plus EAGLE spec-decoding (--speculative-algorithm EAGLE --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4) behindSGLANG_ENABLE_SPEC_V2=1.--use-chat-templateper AGENTS.md for MTP.{ tp: 8, conc-start: 4, conc-end: 64, spec-decoding: mtp }for 1k1k and 8k1k.perf-changelog.yamldiff is append-only.Dependency
This PR depends on #1253 ([AMD][ROCM] Fix benchmark_serving Rust Tokenizer Crash via Direct transformers AutoTokenizer), which must be merged first.
The previous attempt at this config (PR #1091) was marked
[SGLang broken]and closed without merging. The root cause wasbenchmark_serving.pyimportingget_tokenizerfrom vllm, which internally callsget_cached_tokenizer()and accessestokenizer.all_special_tokens_extended— an attribute not present on the Rust-backedTokenizersBackendused by GLM-5. PR #1253 fixes this by replacing the vllm import with a directtransformers.AutoTokenizer.from_pretrained()call, unblocking this config.Test plan
bash -n benchmarks/single_node/glm5_fp4_mi355x_mtp.sh— bash syntax OK.git diff perf-changelog.yamlshows only additions.lmsysorg/sglang-rocm:v0.5.10rc0-rocm700-mi35x-20260422andv0.5.10.post1-rocm700-mi35x-20260428withamd/GLM-5-MXFP4, TP=8, MI355X (after applying PR [AMD][ROCM] Fix benchmark_serving Rust Tokenizer Crash via Direct transformers AutoTokenizer #1253 tokenizer fix).🤖 Generated with Claude Code