Skip to content

Add B300 config: dsr1-fp4-sglang (non-MTP)#1049

Merged
functionstackx merged 4 commits intomainfrom
claude/add-dsr1-fp4-b300-sglang
Apr 17, 2026
Merged

Add B300 config: dsr1-fp4-sglang (non-MTP)#1049
functionstackx merged 4 commits intomainfrom
claude/add-dsr1-fp4-b300-sglang

Conversation

@functionstackx
Copy link
Copy Markdown
Contributor

Summary

  • Adds dsr1-fp4-b300-sglang config (non-MTP DeepSeek-R1 FP4 on B300 via SGLang).
  • New benchmark script benchmarks/single_node/dsr1_fp4_b300.sh reuses the existing B200 DSR1 FP4 SGLang recipe as-is — the SGLang DSR1 cookbook does not yet have a B300-specific recipe. A comment at the top of the script records this.
  • Image bumped to lmsysorg/sglang:v0.5.10.post1-cu130 to match the standard B300 SGLang image already used by the Qwen3.5 B300 configs.
  • No changes to runners/launch_b300-nv.sh or .github/workflows/benchmark-tmpl.yml — already wired up by Add B300 config: qwen3.5-fp8-sglang-mtp #1035.

Test plan

  • Sweep picks up dsr1-fp4-b300-sglang and runs 1k1k / 8k1k across TP=4/EP=4 and TP=8/EP=8 search space
  • Results publish to inferencex.com and look sane relative to B200 DSR1 FP4 SGLang

🤖 Generated with Claude Code

@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

If additional help is needed, PR authors can reach out to core maintainers over Slack.

Comment thread benchmarks/single_node/dsr1_fp4_b300.sh
Comment thread perf-changelog.yaml
@functionstackx functionstackx marked this pull request as draft April 17, 2026 06:48
functionstackx and others added 4 commits April 17, 2026 03:21
The SGLang cookbook does not have a B300-specific DSR1 FP4 recipe, so this
config reuses the existing B200 DSR1 FP4 SGLang recipe as-is until B300-
specific tuning is available. Image bumped to v0.5.10.post1-cu130 to match
the standard B300 SGLang image used by other B300 configs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Also mirror the note into nvidia-master.yaml and perf-changelog.yaml so
future readers can see why the B200 recipe is being reused for B300.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The /scratch/models override was unconditional, forcing every single-node
B300 job to use a pre-staged local model path. Only Qwen3.5-397B-A17B-FP8
is pre-staged there; DSR1 and other models need to fall through so their
benchmark scripts can `hf download` into the mounted cache.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@functionstackx functionstackx force-pushed the claude/add-dsr1-fp4-b300-sglang branch from ecabc90 to 20dbe26 Compare April 17, 2026 07:21
@functionstackx functionstackx marked this pull request as ready for review April 17, 2026 07:47
Comment thread runners/launch_b300-nv.sh
Comment on lines 219 to 227
HF_HUB_CACHE_MOUNT="/scratch/models"
export MODEL="/scratch/models/${MODEL#*/}"
# Qwen3.5-397B-A17B-FP8 is pre-staged under /scratch/models on the B300 cluster,
# so point MODEL at the local copy. Other models fall through and use `hf download`
# against the mounted cache from their benchmark script.
if [[ "$MODEL" == "Qwen/Qwen3.5-397B-A17B-FP8" ]]; then
export MODEL="/scratch/models/${MODEL#*/}"
fi
SQUASH_FILE="/data/squash/$(echo "$IMAGE" | sed 's/[\/:@#]/_/g').sqsh"
FRAMEWORK_SUFFIX=$([[ "$FRAMEWORK" == "trt" ]] && printf '_trt' || printf '')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The MODEL path rewrite in launch_b300-nv.sh was narrowed to only apply to Qwen/Qwen3.5-397B-A17B-FP8; for the new dsr1-fp4-b300-sglang config (model: nvidia/DeepSeek-R1-0528-FP4-V2), MODEL is never rewritten and remains the HuggingFace repo ID. Inside the B300 single-node container, HF_HUB_CACHE=/mnt/hf_hub_cache/ (set by CI) is exported via --export=ALL but /mnt/hf_hub_cache/ is never mounted — only /scratch/models:/scratch/models is mounted — so hf download nvidia/DeepSeek-R1-0528-FP4-V2 cannot find the pre-staged model and may attempt a ~600 GB internet download, and --model-path nvidia/DeepSeek-R1-0528-FP4-V2 will cause SGLang to fail to start. Fix: add nvidia/DeepSeek-R1-0528-FP4-V2 to the if condition in the single-node else branch so MODEL is rewritten to /scratch/models/DeepSeek-R1-0528-FP4-V2, matching the Qwen pattern and consistent with what the multinode branch already does for this model.

Extended reasoning...

What the bug is and how it manifests

PR #1035 introduced the B300 single-node runner and originally rewrote MODEL unconditionally: export MODEL="/scratch/models/${MODEL#*/}". This PR changed that to a conditional that only rewrites when MODEL == "Qwen/Qwen3.5-397B-A17B-FP8", with a comment saying other models "fall through and use hf download". For the new dsr1-fp4-b300-sglang config (model: nvidia/DeepSeek-R1-0528-FP4-V2), MODEL therefore remains the raw HuggingFace repo ID when the benchmark script is invoked.

The specific code path that triggers it

benchmark-tmpl.yml line 74 sets HF_HUB_CACHE=/mnt/hf_hub_cache/. The B300 single-node runner sets HF_HUB_CACHE_MOUNT="/scratch/models" and mounts it as --container-mounts=...,: — i.e. /scratch/models:/scratch/models. It passes --export=ALL which exports HF_HUB_CACHE=/mnt/hf_hub_cache/ into the container, but /mnt/hf_hub_cache/ is never mounted there. When dsr1_fp4_b300.sh runs inside the container, MODEL=nvidia/DeepSeek-R1-0528-FP4-V2, so line 23 (hf download "") invokes huggingface-cli against HF_HUB_CACHE=/mnt/hf_hub_cache/ (non-existent in the container). The model cannot be found and huggingface-cli will attempt a ~600 GB internet download or fail. Similarly, --model-path nvidia/DeepSeek-R1-0528-FP4-V2 on the SGLang launch command causes SGLang to attempt the same broken HF resolution path, preventing server startup entirely.

Why existing code does not prevent it

The PR comment says other models can use hf download against the mounted cache. However, the cache mount is at /scratch/models:/scratch/models, while HF_HUB_CACHE (exported from the host) points to /mnt/hf_hub_cache/. These paths are misaligned, so HF cache lookup fails inside the container. B200 correctly handles this by mounting -v : (i.e. /raid/hf_hub_cache/:/mnt/hf_hub_cache/), making the mount point match the env var. B300 does not do this — it mounts at /scratch/models but exports HF_HUB_CACHE=/mnt/hf_hub_cache/.

What the impact would be

The dsr1-fp4-b300-sglang benchmark will fail entirely: SGLang cannot locate the pre-staged model weights, so the server fails to start and all benchmark runs produce no results. At best, a ~600 GB download is attempted and times out; at worst the job fails immediately. This completely blocks the new config from producing any benchmark data.

Step-by-step proof

  1. CI runs a dsr1-fp4-b300-sglang job with model: nvidia/DeepSeek-R1-0528-FP4-V2 from nvidia-master.yaml.
  2. launch_b300-nv.sh enters the single-node else branch. The condition if [[ "" == "Qwen/Qwen3.5-397B-A17B-FP8" ]] is false, so MODEL stays as nvidia/DeepSeek-R1-0528-FP4-V2.
  3. The runner mounts /scratch/models:/scratch/models and exports HF_HUB_CACHE=/mnt/hf_hub_cache/ via --export=ALL.
  4. Inside the container: /mnt/hf_hub_cache/ is not mounted; /scratch/models is mounted but HF does not know to look there.
  5. dsr1_fp4_b300.sh line 23: hf download "nvidia/DeepSeek-R1-0528-FP4-V2" — the CLI checks HF_HUB_CACHE=/mnt/hf_hub_cache/ (missing), then falls back, and initiates a download of the ~600 GB model.
  6. --model-path nvidia/DeepSeek-R1-0528-FP4-V2 on the SGLang server launch — SGLang makes the same HF lookup, fails to find local weights, and the server cannot start.
  7. Contrast with the multinode branch of the same script (lines 23-26): it explicitly sets MODEL_PATH="/scratch/models/deepseek-r1-0528-nvfp4-v2" for dsr1-fp4, confirming the model IS pre-staged at that path on the B300 cluster.
  8. Fix: add an elif clause to rewrite MODEL to /scratch/models/DeepSeek-R1-0528-FP4-V2 for nvidia/DeepSeek-R1-0528-FP4-V2, consistent with the strip-prefix pattern used for Qwen and with the multinode branch's explicit local path.

@functionstackx functionstackx merged commit 9ebeafb into main Apr 17, 2026
54 of 63 checks passed
@functionstackx functionstackx deleted the claude/add-dsr1-fp4-b300-sglang branch April 17, 2026 08:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

1 participant