Add B300 config: qwen3.5-fp8-sglang (non-MTP)#1048
Conversation
Mirrors the existing qwen3.5-fp8-b300-sglang-mtp config but without speculative decoding, for a non-MTP baseline on B300. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow If additional help is needed, PR authors can reach out to core maintainers over Slack. |
2 similar comments
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow If additional help is needed, PR authors can reach out to core maintainers over Slack. |
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow If additional help is needed, PR authors can reach out to core maintainers over Slack. |
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
LGTM — simple non-MTP counterpart config, the one nit (placeholder pr-link) is already flagged inline.
Extended reasoning...
Overview
This PR adds qwen3.5-fp8-b300-sglang as the non-MTP counterpart to the qwen3.5-fp8-b300-sglang-mtp config introduced in #1035. Changes span three files: a new config block in nvidia-master.yaml, a new benchmark shell script, and a perf-changelog entry.
Security risks
None. These are benchmark configuration files and a shell script that follows the same pattern as dozens of existing scripts in the repo. No auth, crypto, or permissions code is touched.
Level of scrutiny
Low. The config block in nvidia-master.yaml is a direct copy of the MTP entry with the spec-decoding: mtp field omitted. The benchmark script is a clean copy of the MTP script with the --speculative-* flags and SGLANG_ENABLE_SPEC_V2=1 env var removed. These are mechanical, pattern-following changes with clear intent and no ambiguity.
Other factors
The only issue found is a metadata nit in perf-changelog.yaml: the pr-link field uses the placeholder pull/XXXX instead of the actual pull/1048. This is already flagged as an inline comment and has zero impact on benchmark execution or correctness. No outstanding human reviewer comments exist on this PR.
| - config-keys: | ||
| - qwen3.5-fp8-b300-sglang | ||
| description: | ||
| - "Add Qwen3.5-397B-A17B-FP8 B300 SGLang benchmark (non-MTP)" | ||
| - "Image: lmsysorg/sglang:v0.5.10.post1-cu130" | ||
| - "TP=4, concurrency 4-256 for 1k1k and 8k1k" | ||
| pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1048 |
There was a problem hiding this comment.
🟡 The perf-changelog.yaml entry for qwen3.5-fp8-b300-sglang uses the placeholder pr-link 'pull/XXXX' instead of the actual PR number 1048. Update line 1398 to read 'pull/1048' to restore changelog traceability — every other recent entry uses the real PR number.
Extended reasoning...
What the bug is: The new perf-changelog.yaml entry for the qwen3.5-fp8-b300-sglang config (lines 1392-1398) has pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX instead of the actual PR number 1048. The pr-link field is the primary mechanism used to trace a changelog entry back to the pull request that introduced or updated a benchmark configuration.
How it manifests: Any tooling, script, or human reviewer who tries to cross-reference this changelog entry with the originating PR will follow a broken/nonexistent URL. Unlike most other entries which have real PR numbers (e.g., the sibling qwen3.5-fp8-b300-sglang-mtp entry at line 1390 correctly points to /pull/1035), this entry is untraceable by PR number.
The specific code path: The last entry in perf-changelog.yaml reads:
- config-keys:
- qwen3.5-fp8-b300-sglang
description:
- "Add Qwen3.5-397B-A17B-FP8 B300 SGLang benchmark (non-MTP)"
- "Image: lmsysorg/sglang:v0.5.10.post1-cu130"
- "TP=4, concurrency 4-256 for 1k1k and 8k1k"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXXThe PR diff shows /pull/1048 was intended, but what was actually committed (confirmed via git show e76cbda) is XXXX.
Why existing code doesn't prevent it: There is no validation in CI that checks pr-link fields in perf-changelog.yaml for placeholder values. The placeholder was likely present before the PR number was assigned and was never replaced before merging.
Impact: Changelog traceability is broken for this entry. Anyone auditing which PR introduced the qwen3.5-fp8-b300-sglang config will be unable to follow the link. This is a metadata-only issue with no effect on benchmark execution.
Fix: Change line 1398 from pull/XXXX to pull/1048.
Step-by-step proof:
- Commit e76cbda is the merge commit for PR Add B300 config: qwen3.5-fp8-sglang (non-MTP) #1048 ("Add B300 config: qwen3.5-fp8-sglang").
- Running
git show e76cbda -- perf-changelog.yamlconfirms the committed diff addspr-link: .../pull/XXXX(not/pull/1048). - The current file on disk at
perf-changelog.yamlline 1398 readspull/XXXX. - The adjacent MTP entry (PR Add B300 config: qwen3.5-fp8-sglang-mtp #1035) at line 1390 correctly reads
pull/1035, establishing the expected pattern. - Therefore the XXXX placeholder was committed as-is and needs to be corrected to
pull/1048.
B300 cluster has a subset of nodes with hardware/network issues that cause benchmarks to hang or fail to start. Constrain salloc to the known-working set (b300-[001-006,008-012,017-020]) until those nodes are remediated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Without -N 1, salloc would allocate every node matching --nodelist; the single-node benchmark only needs one. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
qwen3.5-fp8-b300-sglangconfig (non-MTP counterpart toqwen3.5-fp8-b300-sglang-mtp).benchmarks/single_node/qwen3.5_fp8_b300.shmirrors the MTP script with speculative-decoding args (--speculative-*andSGLANG_ENABLE_SPEC_V2=1) removed.runners/launch_b300-nv.shor.github/workflows/benchmark-tmpl.yml— already wired up by Add B300 config: qwen3.5-fp8-sglang-mtp #1035.Test plan
qwen3.5-fp8-b300-sglangand runs 1k1k / 8k1k at TP=4, concurrency 4–256🤖 Generated with Claude Code