Add B300 config: qwen3.5-fp8-sglang (non-MTP) by functionstackx · Pull Request #1048 · SemiAnalysisAI/InferenceX

functionstackx · 2026-04-17T05:50:05Z

Summary

Adds qwen3.5-fp8-b300-sglang config (non-MTP counterpart to qwen3.5-fp8-b300-sglang-mtp).
New benchmark script benchmarks/single_node/qwen3.5_fp8_b300.sh mirrors the MTP script with speculative-decoding args (--speculative-* and SGLANG_ENABLE_SPEC_V2=1) removed.
No changes to runners/launch_b300-nv.sh or .github/workflows/benchmark-tmpl.yml — already wired up by Add B300 config: qwen3.5-fp8-sglang-mtp #1035.

Test plan

Sweep picks up qwen3.5-fp8-b300-sglang and runs 1k1k / 8k1k at TP=4, concurrency 4–256
Results publish to inferencex.com alongside the MTP numbers for comparison

🤖 Generated with Claude Code

Mirrors the existing qwen3.5-fp8-b300-sglang-mtp config but without speculative decoding, for a non-MTP baseline on B300. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-17T05:50:15Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-04-17T05:50:15Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-04-17T05:50:15Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

If additional help is needed, PR authors can reach out to core maintainers over Slack.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

claude

LGTM — simple non-MTP counterpart config, the one nit (placeholder pr-link) is already flagged inline.

Extended reasoning...

Overview

This PR adds qwen3.5-fp8-b300-sglang as the non-MTP counterpart to the qwen3.5-fp8-b300-sglang-mtp config introduced in #1035. Changes span three files: a new config block in nvidia-master.yaml, a new benchmark shell script, and a perf-changelog entry.

Security risks

None. These are benchmark configuration files and a shell script that follows the same pattern as dozens of existing scripts in the repo. No auth, crypto, or permissions code is touched.

Level of scrutiny

Low. The config block in nvidia-master.yaml is a direct copy of the MTP entry with the spec-decoding: mtp field omitted. The benchmark script is a clean copy of the MTP script with the --speculative-* flags and SGLANG_ENABLE_SPEC_V2=1 env var removed. These are mechanical, pattern-following changes with clear intent and no ambiguity.

Other factors

The only issue found is a metadata nit in perf-changelog.yaml: the pr-link field uses the placeholder pull/XXXX instead of the actual pull/1048. This is already flagged as an inline comment and has zero impact on benchmark execution or correctness. No outstanding human reviewer comments exist on this PR.

claude · 2026-04-17T06:05:29Z

+- config-keys:
+    - qwen3.5-fp8-b300-sglang
+  description:
+    - "Add Qwen3.5-397B-A17B-FP8 B300 SGLang benchmark (non-MTP)"
+    - "Image: lmsysorg/sglang:v0.5.10.post1-cu130"
+    - "TP=4, concurrency 4-256 for 1k1k and 8k1k"
+  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1048


🟡 The perf-changelog.yaml entry for qwen3.5-fp8-b300-sglang uses the placeholder pr-link 'pull/XXXX' instead of the actual PR number 1048. Update line 1398 to read 'pull/1048' to restore changelog traceability — every other recent entry uses the real PR number.

Extended reasoning...

What the bug is: The new perf-changelog.yaml entry for the qwen3.5-fp8-b300-sglang config (lines 1392-1398) has pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX instead of the actual PR number 1048. The pr-link field is the primary mechanism used to trace a changelog entry back to the pull request that introduced or updated a benchmark configuration.

How it manifests: Any tooling, script, or human reviewer who tries to cross-reference this changelog entry with the originating PR will follow a broken/nonexistent URL. Unlike most other entries which have real PR numbers (e.g., the sibling qwen3.5-fp8-b300-sglang-mtp entry at line 1390 correctly points to /pull/1035), this entry is untraceable by PR number.

The specific code path: The last entry in perf-changelog.yaml reads:

- config-keys: - qwen3.5-fp8-b300-sglang description: - "Add Qwen3.5-397B-A17B-FP8 B300 SGLang benchmark (non-MTP)" - "Image: lmsysorg/sglang:v0.5.10.post1-cu130" - "TP=4, concurrency 4-256 for 1k1k and 8k1k" pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX

The PR diff shows /pull/1048 was intended, but what was actually committed (confirmed via git show e76cbda) is XXXX.

Why existing code doesn't prevent it: There is no validation in CI that checks pr-link fields in perf-changelog.yaml for placeholder values. The placeholder was likely present before the PR number was assigned and was never replaced before merging.

Impact: Changelog traceability is broken for this entry. Anyone auditing which PR introduced the qwen3.5-fp8-b300-sglang config will be unable to follow the link. This is a metadata-only issue with no effect on benchmark execution.

Fix: Change line 1398 from pull/XXXX to pull/1048.

Step-by-step proof:

Commit e76cbda is the merge commit for PR Add B300 config: qwen3.5-fp8-sglang (non-MTP) #1048 ("Add B300 config: qwen3.5-fp8-sglang").

Running git show e76cbda -- perf-changelog.yaml confirms the committed diff adds pr-link: .../pull/XXXX (not /pull/1048).

The current file on disk at perf-changelog.yaml line 1398 reads pull/XXXX.

The adjacent MTP entry (PR Add B300 config: qwen3.5-fp8-sglang-mtp #1035) at line 1390 correctly reads pull/1035, establishing the expected pattern.

Therefore the XXXX placeholder was committed as-is and needs to be corrected to pull/1048.

B300 cluster has a subset of nodes with hardware/network issues that cause benchmarks to hang or fail to start. Constrain salloc to the known-working set (b300-[001-006,008-012,017-020]) until those nodes are remediated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Without -N 1, salloc would allocate every node matching --nodelist; the single-node benchmark only needs one. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Add B300 config: qwen3.5-fp8-sglang

e76cbda

Mirrors the existing qwen3.5-fp8-b300-sglang-mtp config but without speculative decoding, for a non-MTP baseline on B300. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

functionstackx requested a review from a team April 17, 2026 05:50

functionstackx requested review from jgangani and kedarpotdar-nv as code owners April 17, 2026 05:50

github-project-automation Bot added this to InferenceMAX Board Apr 17, 2026

Fill in PR link for qwen3.5-fp8-b300-sglang changelog entry

8685590

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

functionstackx added the sweep-enabled label Apr 17, 2026

claude Bot reviewed Apr 17, 2026

View reviewed changes

functionstackx and others added 2 commits April 17, 2026 02:47

Request a single node from the B300 nodelist

e18c308

Without -N 1, salloc would allocate every node matching --nodelist; the single-node benchmark only needs one. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

functionstackx merged commit c0f62a2 into main Apr 17, 2026
30 checks passed

functionstackx deleted the claude/add-qwen3.5-fp8-b300-non-mtp branch April 17, 2026 07:20

github-project-automation Bot moved this to Done in InferenceMAX Board Apr 17, 2026

This was referenced Apr 17, 2026

Add B300 config: minimaxm2.5-fp4-vllm #1055

Merged

Add B300 config: dsr1-fp8-sglang (non-MTP) #1050

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add B300 config: qwen3.5-fp8-sglang (non-MTP)#1048

Add B300 config: qwen3.5-fp8-sglang (non-MTP)#1048
functionstackx merged 4 commits intomainfrom
claude/add-qwen3.5-fp8-b300-non-mtp

functionstackx commented Apr 17, 2026

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

claude Bot left a comment

Uh oh!

claude Bot Apr 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

functionstackx commented Apr 17, 2026

Summary

Test plan

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

claude Bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant