Add B300 config: dsr1-fp4-sglang (non-MTP) by functionstackx · Pull Request #1049 · SemiAnalysisAI/InferenceX

functionstackx · 2026-04-17T05:56:48Z

Summary

Adds dsr1-fp4-b300-sglang config (non-MTP DeepSeek-R1 FP4 on B300 via SGLang).
New benchmark script benchmarks/single_node/dsr1_fp4_b300.sh reuses the existing B200 DSR1 FP4 SGLang recipe as-is — the SGLang DSR1 cookbook does not yet have a B300-specific recipe. A comment at the top of the script records this.
Image bumped to lmsysorg/sglang:v0.5.10.post1-cu130 to match the standard B300 SGLang image already used by the Qwen3.5 B300 configs.
No changes to runners/launch_b300-nv.sh or .github/workflows/benchmark-tmpl.yml — already wired up by Add B300 config: qwen3.5-fp8-sglang-mtp #1035.

Test plan

Sweep picks up dsr1-fp4-b300-sglang and runs 1k1k / 8k1k across TP=4/EP=4 and TP=8/EP=8 search space
Results publish to inferencex.com and look sane relative to B200 DSR1 FP4 SGLang

🤖 Generated with Claude Code

github-actions · 2026-04-17T05:56:56Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

If additional help is needed, PR authors can reach out to core maintainers over Slack.

The SGLang cookbook does not have a B300-specific DSR1 FP4 recipe, so this config reuses the existing B200 DSR1 FP4 SGLang recipe as-is until B300- specific tuning is available. Image bumped to v0.5.10.post1-cu130 to match the standard B300 SGLang image used by other B300 configs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Also mirror the note into nvidia-master.yaml and perf-changelog.yaml so future readers can see why the B200 recipe is being reused for B300. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The /scratch/models override was unconditional, forcing every single-node B300 job to use a pre-staged local model path. Only Qwen3.5-397B-A17B-FP8 is pre-staged there; DSR1 and other models need to fall through so their benchmark scripts can `hf download` into the mounted cache. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

claude · 2026-04-17T08:02:47Z

    HF_HUB_CACHE_MOUNT="/scratch/models"
-    export MODEL="/scratch/models/${MODEL#*/}"
+    # Qwen3.5-397B-A17B-FP8 is pre-staged under /scratch/models on the B300 cluster,
+    # so point MODEL at the local copy. Other models fall through and use `hf download`
+    # against the mounted cache from their benchmark script.
+    if [[ "$MODEL" == "Qwen/Qwen3.5-397B-A17B-FP8" ]]; then
+        export MODEL="/scratch/models/${MODEL#*/}"
+    fi
    SQUASH_FILE="/data/squash/$(echo "$IMAGE" | sed 's/[\/:@#]/_/g').sqsh"
    FRAMEWORK_SUFFIX=$([[ "$FRAMEWORK" == "trt" ]] && printf '_trt' || printf '')


🔴 The MODEL path rewrite in launch_b300-nv.sh was narrowed to only apply to Qwen/Qwen3.5-397B-A17B-FP8; for the new dsr1-fp4-b300-sglang config (model: nvidia/DeepSeek-R1-0528-FP4-V2), MODEL is never rewritten and remains the HuggingFace repo ID. Inside the B300 single-node container, HF_HUB_CACHE=/mnt/hf_hub_cache/ (set by CI) is exported via --export=ALL but /mnt/hf_hub_cache/ is never mounted — only /scratch/models:/scratch/models is mounted — so hf download nvidia/DeepSeek-R1-0528-FP4-V2 cannot find the pre-staged model and may attempt a ~600 GB internet download, and --model-path nvidia/DeepSeek-R1-0528-FP4-V2 will cause SGLang to fail to start. Fix: add nvidia/DeepSeek-R1-0528-FP4-V2 to the if condition in the single-node else branch so MODEL is rewritten to /scratch/models/DeepSeek-R1-0528-FP4-V2, matching the Qwen pattern and consistent with what the multinode branch already does for this model.

Extended reasoning...

What the bug is and how it manifests

PR #1035 introduced the B300 single-node runner and originally rewrote MODEL unconditionally: export MODEL="/scratch/models/${MODEL#*/}". This PR changed that to a conditional that only rewrites when MODEL == "Qwen/Qwen3.5-397B-A17B-FP8", with a comment saying other models "fall through and use hf download". For the new dsr1-fp4-b300-sglang config (model: nvidia/DeepSeek-R1-0528-FP4-V2), MODEL therefore remains the raw HuggingFace repo ID when the benchmark script is invoked.

The specific code path that triggers it

benchmark-tmpl.yml line 74 sets HF_HUB_CACHE=/mnt/hf_hub_cache/. The B300 single-node runner sets HF_HUB_CACHE_MOUNT="/scratch/models" and mounts it as --container-mounts=...,: — i.e. /scratch/models:/scratch/models. It passes --export=ALL which exports HF_HUB_CACHE=/mnt/hf_hub_cache/ into the container, but /mnt/hf_hub_cache/ is never mounted there. When dsr1_fp4_b300.sh runs inside the container, MODEL=nvidia/DeepSeek-R1-0528-FP4-V2, so line 23 (hf download "") invokes huggingface-cli against HF_HUB_CACHE=/mnt/hf_hub_cache/ (non-existent in the container). The model cannot be found and huggingface-cli will attempt a ~600 GB internet download or fail. Similarly, --model-path nvidia/DeepSeek-R1-0528-FP4-V2 on the SGLang launch command causes SGLang to attempt the same broken HF resolution path, preventing server startup entirely.

Why existing code does not prevent it

The PR comment says other models can use hf download against the mounted cache. However, the cache mount is at /scratch/models:/scratch/models, while HF_HUB_CACHE (exported from the host) points to /mnt/hf_hub_cache/. These paths are misaligned, so HF cache lookup fails inside the container. B200 correctly handles this by mounting -v : (i.e. /raid/hf_hub_cache/:/mnt/hf_hub_cache/), making the mount point match the env var. B300 does not do this — it mounts at /scratch/models but exports HF_HUB_CACHE=/mnt/hf_hub_cache/.

What the impact would be

The dsr1-fp4-b300-sglang benchmark will fail entirely: SGLang cannot locate the pre-staged model weights, so the server fails to start and all benchmark runs produce no results. At best, a ~600 GB download is attempted and times out; at worst the job fails immediately. This completely blocks the new config from producing any benchmark data.

Step-by-step proof

CI runs a dsr1-fp4-b300-sglang job with model: nvidia/DeepSeek-R1-0528-FP4-V2 from nvidia-master.yaml.

launch_b300-nv.sh enters the single-node else branch. The condition if [[ "" == "Qwen/Qwen3.5-397B-A17B-FP8" ]] is false, so MODEL stays as nvidia/DeepSeek-R1-0528-FP4-V2.

The runner mounts /scratch/models:/scratch/models and exports HF_HUB_CACHE=/mnt/hf_hub_cache/ via --export=ALL.

Inside the container: /mnt/hf_hub_cache/ is not mounted; /scratch/models is mounted but HF does not know to look there.

dsr1_fp4_b300.sh line 23: hf download "nvidia/DeepSeek-R1-0528-FP4-V2" — the CLI checks HF_HUB_CACHE=/mnt/hf_hub_cache/ (missing), then falls back, and initiates a download of the ~600 GB model.

--model-path nvidia/DeepSeek-R1-0528-FP4-V2 on the SGLang server launch — SGLang makes the same HF lookup, fails to find local weights, and the server cannot start.

Contrast with the multinode branch of the same script (lines 23-26): it explicitly sets MODEL_PATH="/scratch/models/deepseek-r1-0528-nvfp4-v2" for dsr1-fp4, confirming the model IS pre-staged at that path on the B300 cluster.

Fix: add an elif clause to rewrite MODEL to /scratch/models/DeepSeek-R1-0528-FP4-V2 for nvidia/DeepSeek-R1-0528-FP4-V2, consistent with the strip-prefix pattern used for Qwen and with the multinode branch's explicit local path.

functionstackx requested a review from a team April 17, 2026 05:56

functionstackx requested review from jgangani and kedarpotdar-nv as code owners April 17, 2026 05:56

github-project-automation Bot added this to InferenceMAX Board Apr 17, 2026

functionstackx mentioned this pull request Apr 17, 2026

Add B300 config: dsr1-fp8-sglang (non-MTP) #1050

Merged

2 tasks

functionstackx added the sweep-enabled label Apr 17, 2026

claude Bot reviewed Apr 17, 2026

View reviewed changes

Comment thread benchmarks/single_node/dsr1_fp4_b300.sh

Comment thread perf-changelog.yaml

functionstackx removed the sweep-enabled label Apr 17, 2026

functionstackx marked this pull request as draft April 17, 2026 06:48

functionstackx and others added 4 commits April 17, 2026 03:21

Fill in PR link for dsr1-fp4-b300-sglang changelog entry

ae5e0bb

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Scope the cookbook note with "At the time of submission"

8ef6498

Also mirror the note into nvidia-master.yaml and perf-changelog.yaml so future readers can see why the B200 recipe is being reused for B300. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

functionstackx force-pushed the claude/add-dsr1-fp4-b300-sglang branch from ecabc90 to 20dbe26 Compare April 17, 2026 07:21

functionstackx marked this pull request as ready for review April 17, 2026 07:47

functionstackx added the sweep-enabled label Apr 17, 2026

claude Bot reviewed Apr 17, 2026

View reviewed changes

functionstackx merged commit 9ebeafb into main Apr 17, 2026
54 of 63 checks passed

functionstackx deleted the claude/add-dsr1-fp4-b300-sglang branch April 17, 2026 08:18

github-project-automation Bot moved this to Done in InferenceMAX Board Apr 17, 2026

This was referenced Apr 17, 2026

Add B300 config: minimaxm2.5-fp4-vllm #1055

Merged

Add B300 config: glm5-fp4-sglang #1058

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add B300 config: dsr1-fp4-sglang (non-MTP)#1049

Add B300 config: dsr1-fp4-sglang (non-MTP)#1049
functionstackx merged 4 commits intomainfrom
claude/add-dsr1-fp4-b300-sglang

functionstackx commented Apr 17, 2026

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

Uh oh!

Uh oh!

claude Bot Apr 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

functionstackx commented Apr 17, 2026

Summary

Test plan

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

Uh oh!

Uh oh!

claude Bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant