Add B300 config: glm5-fp8-sglang by functionstackx · Pull Request #1051 · SemiAnalysisAI/InferenceX

functionstackx · 2026-04-17T06:15:50Z

Summary

Adds glm5-fp8-b300-sglang config (GLM-5 FP8 on B300 via SGLang).
New benchmark script benchmarks/single_node/glm5_fp8_b300.sh reuses the existing B200 GLM5 FP8 SGLang recipe as-is — at the time of submission, the SGLang GLM-5.1 cookbook does not yet have a B300-specific recipe. The note is mirrored in glm5_fp8_b300.sh, nvidia-master.yaml, and perf-changelog.yaml.
Image: lmsysorg/sglang:v0.5.10.post1-cu130 — the standard B300 SGLang image already used by other B300 configs.
No changes to runners/launch_b300-nv.sh or .github/workflows/benchmark-tmpl.yml — already wired up by Add B300 config: qwen3.5-fp8-sglang-mtp #1035.

Test plan

Sweep picks up glm5-fp8-b300-sglang and runs 1k1k / 8k1k at TP=8, concurrency 4-256
Results publish to inferencex.com and look sane relative to B200 GLM5 FP8 SGLang

🤖 Generated with Claude Code

github-actions · 2026-04-17T06:15:59Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-04-17T06:16:00Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

If additional help is needed, PR authors can reach out to core maintainers over Slack.

claude · 2026-04-17T06:30:14Z

+
+nvidia-smi
+
+hf download "$MODEL"


🔴 The new script calls hf download "$MODEL" (line 24), but on B300 the runner overrides MODEL to a local filesystem path (/scratch/models/GLM-5-FP8), which is not a valid HuggingFace repo ID — causing hf download to fail. Remove line 24; models are pre-staged on B300, as confirmed by qwen3.5_fp8_b300_mtp.sh which correctly omits this call.

Extended reasoning...

What the bug is and how it manifests

The new benchmark script benchmarks/single_node/glm5_fp8_b300.sh (line 24) calls hf download "$MODEL". The hf CLI's download subcommand expects a HuggingFace repository identifier in owner/repo format (e.g. zai-org/GLM-5-FP8). Passing a local filesystem path instead causes the command to exit with an error.

The specific code path that triggers it

In runners/launch_b300-nv.sh (single-node branch, line 220), the runner transforms the model identifier before invoking the benchmark script:

export MODEL="/scratch/models/${MODEL#*/}"

So the original config value zai-org/GLM-5-FP8 becomes /scratch/models/GLM-5-FP8. The benchmark script then executes hf download "/scratch/models/GLM-5-FP8", which is not a valid repo ID.

Why existing code doesn't prevent it

There is no set -e before line 24, so the script continues execution after hf download fails. The SGLang server is then started with --model-path=$MODEL, which correctly points to the pre-staged local path — so the benchmark itself still runs. This masks the bug during casual observation but leaves a broken command and spurious error output in every run's logs.

What the impact would be

Every B300 run of this config will produce an error from hf download in the logs. If the B300 environment ever changes so that /scratch/models/ is not pre-populated (e.g. a new node or a CI dry-run), the benchmark would fail to start because the model would be absent and the server launch would fail. The spurious error also makes log triage harder for operators.

How to fix it

Remove line 24 (hf download "$MODEL") from benchmarks/single_node/glm5_fp8_b300.sh. Models are pre-staged at /scratch/models/ on B300, so no download step is needed. This matches the pattern of the existing B300 SGLang single-node script benchmarks/single_node/qwen3.5_fp8_b300_mtp.sh, which has no hf download call.

Step-by-step proof

nvidia-master.yaml config specifies model: zai-org/GLM-5-FP8 and runner: b300.

launch_b300-nv.sh single-node branch (line 220) executes: export MODEL="/scratch/models/${MODEL#*/}" → MODEL becomes /scratch/models/GLM-5-FP8.

The runner then calls the benchmark script with this modified MODEL.

glm5_fp8_b300.sh line 24 executes: hf download "/scratch/models/GLM-5-FP8".

hf download fails because /scratch/models/GLM-5-FP8 is not an org/repo identifier.

Since there is no set -e at this point, execution continues to the SGLang server launch, which uses --model-path=$MODEL (the pre-staged path) and succeeds — hiding the error from benchmarking results but leaving it in logs.

claude · 2026-04-17T06:30:14Z

+    - "Add GLM-5 FP8 B300 SGLang benchmark"
+    - "Image: lmsysorg/sglang:v0.5.10.post1-cu130"
+    - "At the time of submission, https://cookbook.sglang.io/autoregressive/GLM/GLM-5.1 does not have a B300-specific recipe, so this reuses the existing GLM5 FP8 B200 SGLang recipe as-is"
+  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1051


🟡 The glm5-fp8-b300-sglang entry in perf-changelog.yaml has a placeholder PR link (pull/XXXX) instead of the actual PR number 1051. It should be updated to #1051.

Extended reasoning...

The perf-changelog.yaml entry for glm5-fp8-b300-sglang was committed with an unresolved placeholder in its pr-link field. The current HEAD of the file contains "pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX" at line 1398, while the PR diff shows the intended value was pull/1051.

The specific code path is straightforward: commit d6e32c3 (the PR #1051 merge commit) introduced the glm5-fp8-b300-sglang entry to perf-changelog.yaml, but the author never replaced the XXXX placeholder before merging. The diff clearly shows pull/1051 as the intended value, yet the committed content still has XXXX.

Nothing in the codebase prevents placeholder values from being committed — there is no pre-commit validation or CI check that would catch an XXXX in a pr-link field. This explains how it slipped through. Other entries in the same file with XXX or XXXX placeholders confirm this is a recurring human error (e.g. glm5-fp8-mi355x-sglang, minimaxm2.5-fp8-h200-vllm).

The impact is limited to documentation/metadata: anyone reading the changelog or trying to trace the history of this benchmark config would find a broken link. The placeholder XXXX does not affect benchmark execution, configuration parsing, or any runtime behavior.

The fix is a one-line change: replace pull/XXXX with pull/1051 on line 1398 of perf-changelog.yaml.

Step-by-step proof: (1) The PR diff shows the new entry ending with "pr-link: #1051". (2) Reading the actual file at HEAD shows the last line is "pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX". (3) Running git show d6e32c3 -- perf-changelog.yaml confirms the committed content has XXXX. (4) The immediately preceding entry (qwen3.5-fp8-b300-sglang-mtp, PR #1035) correctly references its PR number, confirming the XXXX in the glm5 entry is an oversight, not intentional.

At the time of submission, the SGLang GLM-5.1 cookbook does not have a B300-specific recipe, so this config reuses the existing B200 GLM5 FP8 SGLang recipe as-is until B300-specific tuning is available. Image set to lmsysorg/sglang:v0.5.10.post1-cu130 to match the standard B300 SGLang image used by other B300 configs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

functionstackx requested a review from a team April 17, 2026 06:15

functionstackx requested review from jgangani and kedarpotdar-nv as code owners April 17, 2026 06:15

github-project-automation Bot added this to InferenceMAX Board Apr 17, 2026

claude Bot reviewed Apr 17, 2026

View reviewed changes

functionstackx force-pushed the claude/add-glm5-fp8-b300-sglang branch 2 times, most recently from 3b72a75 to 909f691 Compare April 17, 2026 08:19

functionstackx added sweep-enabled and removed sweep-enabled labels Apr 17, 2026

functionstackx and others added 2 commits April 17, 2026 07:22

Fill in PR link for glm5-fp8-b300-sglang changelog entry

24f78d5

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

functionstackx force-pushed the claude/add-glm5-fp8-b300-sglang branch from 909f691 to 24f78d5 Compare April 17, 2026 11:22

functionstackx enabled auto-merge (squash) April 17, 2026 11:23

functionstackx merged commit 6c66a00 into main Apr 17, 2026
17 checks passed

functionstackx deleted the claude/add-glm5-fp8-b300-sglang branch April 17, 2026 11:23

github-project-automation Bot moved this to Done in InferenceMAX Board Apr 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add B300 config: glm5-fp8-sglang#1051

Add B300 config: glm5-fp8-sglang#1051
functionstackx merged 2 commits intomainfrom
claude/add-glm5-fp8-b300-sglang

functionstackx commented Apr 17, 2026

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

claude Bot Apr 17, 2026

Uh oh!

claude Bot Apr 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

functionstackx commented Apr 17, 2026

Summary

Test plan

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

claude Bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant