Add B300 config: glm5-fp4-sglang#1058
Conversation
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow If additional help is needed, PR authors can reach out to core maintainers over Slack. |
| - "Add GLM-5 FP4 (NVFP4) B300 SGLang benchmark" | ||
| - "Image: lmsysorg/sglang:v0.5.10.post1-cu130" | ||
| - "At the time of submission, https://cookbook.sglang.io/autoregressive/GLM/GLM-5 does not have a B300-specific recipe, so this reuses the existing GLM-5 FP4 B200 SGLang recipe as-is" | ||
| pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1058 |
There was a problem hiding this comment.
🟡 The perf-changelog.yaml entry for glm5-fp4-b300-sglang has pr-link set to .../pull/XXXX instead of the actual PR number .../pull/1058. This should be corrected so the changelog permanently records the correct link.
Extended reasoning...
What the bug is and how it manifests
The perf-changelog.yaml entry for glm5-fp4-b300-sglang (the entry added by this PR) uses a placeholder value XXXX in the pr-link field instead of the actual PR number 1058. The field currently reads:
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXXThe specific code path that triggers it
The affected entry is the last entry in perf-changelog.yaml (line 1414), introduced by this PR. The PR diff itself shows the intended value as /pull/1058, but the code committed to main at commit 82d44b0 still contains the placeholder /XXXX.
Why existing code does not prevent it
There is no automated validation that checks perf-changelog.yaml pr-link fields for unreplaced placeholders like XXXX. The CI config validation would pass since the YAML is syntactically valid — the placeholder is just a wrong URL string.
Impact
The changelog will permanently record a broken link for this entry. Anyone referencing the changelog to find the PR that introduced the glm5-fp4-b300-sglang config will be directed to a non-existent GitHub URL instead of PR #1058.
How to fix it
Replace XXXX with 1058 in the pr-link field:
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1058Step-by-step proof
- This PR is numbered Add B300 config: glm5-fp4-sglang #1058, as shown in the PR metadata.
- The PR diff (hunk for
perf-changelog.yaml) shows the added line as+ pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1058. - However, the current file at HEAD (commit
82d44b0) shows the last entry'spr-linkashttps://github.com/SemiAnalysisAI/InferenceX/pull/XXXX. - The comparable
dsr1-fp4-b300-sglangentry directly above (from PR Add B300 config: dsr1-fp4-sglang (non-MTP) #1049) correctly reads/pull/1049. - Conclusion: the placeholder
XXXXwas not replaced before the commit landed onmain.
At the time of submission, the SGLang GLM-5 cookbook (https://cookbook.sglang.io/autoregressive/GLM/GLM-5) does not have a B300-specific recipe, so this config reuses the existing GLM-5 FP4 (NVFP4) B200 SGLang recipe as-is until B300-specific tuning is available. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
489db28 to
be34819
Compare
Summary
glm5-fp4-b300-sglangbenchmark config and the correspondingbenchmarks/single_node/glm5_fp4_b300.shlaunch scriptlmsysorg/sglang:v0.5.10.post1-cu130(same as B200), runner:b300, same TP=4/8 and concurrency search-space as B200Test plan
glm5-fp4-b300-sglangsingle-node benchmark on a B300 node and confirm server starts, benchmark completes, and result file is produced🤖 Generated with Claude Code