Add B300 config: glm5-fp8-sglang#1051
Conversation
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow If additional help is needed, PR authors can reach out to core maintainers over Slack. |
1 similar comment
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow If additional help is needed, PR authors can reach out to core maintainers over Slack. |
|
|
||
| nvidia-smi | ||
|
|
||
| hf download "$MODEL" |
There was a problem hiding this comment.
🔴 The new script calls hf download "$MODEL" (line 24), but on B300 the runner overrides MODEL to a local filesystem path (/scratch/models/GLM-5-FP8), which is not a valid HuggingFace repo ID — causing hf download to fail. Remove line 24; models are pre-staged on B300, as confirmed by qwen3.5_fp8_b300_mtp.sh which correctly omits this call.
Extended reasoning...
What the bug is and how it manifests
The new benchmark script benchmarks/single_node/glm5_fp8_b300.sh (line 24) calls hf download "$MODEL". The hf CLI's download subcommand expects a HuggingFace repository identifier in owner/repo format (e.g. zai-org/GLM-5-FP8). Passing a local filesystem path instead causes the command to exit with an error.
The specific code path that triggers it
In runners/launch_b300-nv.sh (single-node branch, line 220), the runner transforms the model identifier before invoking the benchmark script:
export MODEL="/scratch/models/${MODEL#*/}"So the original config value zai-org/GLM-5-FP8 becomes /scratch/models/GLM-5-FP8. The benchmark script then executes hf download "/scratch/models/GLM-5-FP8", which is not a valid repo ID.
Why existing code doesn't prevent it
There is no set -e before line 24, so the script continues execution after hf download fails. The SGLang server is then started with --model-path=$MODEL, which correctly points to the pre-staged local path — so the benchmark itself still runs. This masks the bug during casual observation but leaves a broken command and spurious error output in every run's logs.
What the impact would be
Every B300 run of this config will produce an error from hf download in the logs. If the B300 environment ever changes so that /scratch/models/ is not pre-populated (e.g. a new node or a CI dry-run), the benchmark would fail to start because the model would be absent and the server launch would fail. The spurious error also makes log triage harder for operators.
How to fix it
Remove line 24 (hf download "$MODEL") from benchmarks/single_node/glm5_fp8_b300.sh. Models are pre-staged at /scratch/models/ on B300, so no download step is needed. This matches the pattern of the existing B300 SGLang single-node script benchmarks/single_node/qwen3.5_fp8_b300_mtp.sh, which has no hf download call.
Step-by-step proof
- nvidia-master.yaml config specifies
model: zai-org/GLM-5-FP8andrunner: b300. launch_b300-nv.shsingle-node branch (line 220) executes:export MODEL="/scratch/models/${MODEL#*/}"→MODELbecomes/scratch/models/GLM-5-FP8.- The runner then calls the benchmark script with this modified
MODEL. glm5_fp8_b300.shline 24 executes:hf download "/scratch/models/GLM-5-FP8".hf downloadfails because/scratch/models/GLM-5-FP8is not anorg/repoidentifier.- Since there is no
set -eat this point, execution continues to the SGLang server launch, which uses--model-path=$MODEL(the pre-staged path) and succeeds — hiding the error from benchmarking results but leaving it in logs.
| - "Add GLM-5 FP8 B300 SGLang benchmark" | ||
| - "Image: lmsysorg/sglang:v0.5.10.post1-cu130" | ||
| - "At the time of submission, https://cookbook.sglang.io/autoregressive/GLM/GLM-5.1 does not have a B300-specific recipe, so this reuses the existing GLM5 FP8 B200 SGLang recipe as-is" | ||
| pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1051 |
There was a problem hiding this comment.
🟡 The glm5-fp8-b300-sglang entry in perf-changelog.yaml has a placeholder PR link (pull/XXXX) instead of the actual PR number 1051. It should be updated to #1051.
Extended reasoning...
The perf-changelog.yaml entry for glm5-fp8-b300-sglang was committed with an unresolved placeholder in its pr-link field. The current HEAD of the file contains "pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX" at line 1398, while the PR diff shows the intended value was pull/1051.
The specific code path is straightforward: commit d6e32c3 (the PR #1051 merge commit) introduced the glm5-fp8-b300-sglang entry to perf-changelog.yaml, but the author never replaced the XXXX placeholder before merging. The diff clearly shows pull/1051 as the intended value, yet the committed content still has XXXX.
Nothing in the codebase prevents placeholder values from being committed — there is no pre-commit validation or CI check that would catch an XXXX in a pr-link field. This explains how it slipped through. Other entries in the same file with XXX or XXXX placeholders confirm this is a recurring human error (e.g. glm5-fp8-mi355x-sglang, minimaxm2.5-fp8-h200-vllm).
The impact is limited to documentation/metadata: anyone reading the changelog or trying to trace the history of this benchmark config would find a broken link. The placeholder XXXX does not affect benchmark execution, configuration parsing, or any runtime behavior.
The fix is a one-line change: replace pull/XXXX with pull/1051 on line 1398 of perf-changelog.yaml.
Step-by-step proof: (1) The PR diff shows the new entry ending with "pr-link: #1051". (2) Reading the actual file at HEAD shows the last line is "pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX". (3) Running git show d6e32c3 -- perf-changelog.yaml confirms the committed content has XXXX. (4) The immediately preceding entry (qwen3.5-fp8-b300-sglang-mtp, PR #1035) correctly references its PR number, confirming the XXXX in the glm5 entry is an oversight, not intentional.
3b72a75 to
909f691
Compare
At the time of submission, the SGLang GLM-5.1 cookbook does not have a B300-specific recipe, so this config reuses the existing B200 GLM5 FP8 SGLang recipe as-is until B300-specific tuning is available. Image set to lmsysorg/sglang:v0.5.10.post1-cu130 to match the standard B300 SGLang image used by other B300 configs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
909f691 to
24f78d5
Compare
Summary
glm5-fp8-b300-sglangconfig (GLM-5 FP8 on B300 via SGLang).benchmarks/single_node/glm5_fp8_b300.shreuses the existing B200 GLM5 FP8 SGLang recipe as-is — at the time of submission, the SGLang GLM-5.1 cookbook does not yet have a B300-specific recipe. The note is mirrored inglm5_fp8_b300.sh,nvidia-master.yaml, andperf-changelog.yaml.lmsysorg/sglang:v0.5.10.post1-cu130— the standard B300 SGLang image already used by other B300 configs.runners/launch_b300-nv.shor.github/workflows/benchmark-tmpl.yml— already wired up by Add B300 config: qwen3.5-fp8-sglang-mtp #1035.Test plan
glm5-fp8-b300-sglangand runs 1k1k / 8k1k at TP=8, concurrency 4-256🤖 Generated with Claude Code