Switch dsv4 B200 SGLang to DeepEP dp-attention recipe by yhyang201 · Pull Request #1140 · SemiAnalysisAI/InferenceX

yhyang201 · 2026-04-24T14:12:02Z

Summary

Replace dsv4 B200 SGLang launch command with DeepEP + dp-attention recipe
Switch from TP8 flashinfer_mxfp4 to TP8/DP8 with --moe-a2a-backend deepep and --enable-dp-attention
Reduce EAGLE spec decoding from 3 steps / 4 draft tokens to 1 step / 2 draft tokens
Add mega-MOE optimizations (SGLANG_OPT_USE_DEEPGEMM_MEGA_MOE, SGLANG_OPT_USE_FAST_MASK_EP, etc.)
Add DeepEP dispatch/combine config with 96 SMs

Test plan

Sweep run produces results for 1k/1k and 8k/1k ISL/OSL on B200

🤖 Generated with Claude Code

Adds the DeepSeek-V4-Flash B200 SGLang recipe from https://docs.sglang.io/cookbook/autoregressive/DeepSeek/DeepSeek-V4. Prefix caching and speculative decoding are disabled for baseline numbers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Uses deepseek-ai/DeepSeek-V4-Pro with tp=8, ep=8, dp-attention enabled and sweep concurrency ranges aligned with dsv4-fp4-b200-vllm (4-1024 at 1k/1k, 4-512 at 8k/1k). Script now passes --enable-dp-attention when DP_ATTENTION=true and sets --mem-fraction-static per the Pro recipe. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Server launch now mirrors the DeepSeek-V4-Pro command from https://docs.sglang.io/cookbook/autoregressive/DeepSeek/DeepSeek-V4: --tp N, --moe-runner-backend flashinfer_mxfp4, --mem-fraction-static 0.82, SGLANG_JIT_DEEPGEMM_PRECOMPILE=0. Speculative decoding omitted and --disable-radix-cache added per the no-spec / no-prefix-cache baseline. YAML search-space drops ep/dp-attn to tp=8, ep=1. Also syncs runners/launch_b200-dgxc-slurm.sh with the HF cache mount path from origin/claude/add-dsv4-fp4-b200-vllm so both PRs stay in agreement on runner layout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The deepseek-v4-blackwell image doesn't expose sglang via system python3, so the module import fails: /usr/bin/python3: Error while finding module specification for 'sglang.launch_server' (ModuleNotFoundError: No module named 'sglang') Switch to the `sglang serve` entrypoint that the cookbook uses; the CLI resolves the correct interpreter. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The lmsysorg/sglang:deepseek-v4-blackwell image installs sglang editable at /workspace/sglang/python — unlike every prior sglang tag which uses /sgl-workspace/sglang. Our $GITHUB_WORKSPACE:/workspace/ bind-mount masks that directory, breaking `import sglang`. Conditionally mount at /ix for this image only and make the dsv4 benchmark script use $PWD for server/metrics/result paths so it works regardless of the mount target. All other configs still mount at /workspace. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The lmsysorg/sglang:deepseek-v4-blackwell image installs sglang editable at /workspace/sglang/python, which our $GITHUB_WORKSPACE:/workspace/ bind-mount masks. Temporary one-line workaround: pip install --no-deps sglang in the benchmark script to restore a non-editable copy in site-packages. Runner reverted to the standard /workspace mount. Marked with a TODO(Cam) for the proper fix once lmsys publishes an image that doesn't editable-install under /workspace. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

'pip install --no-deps sglang' is a no-op when sglang is already registered in site-packages -- even if the underlying editable path is missing -- so the prior workaround never actually swapped in a working install. Uninstall the broken egg-link first, then reinstall. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Back to the proper mount fix so we use the same 'PYTHONNOUSERSITE=1 python3 -m sglang.launch_server ...' invocation as every other sglang single_node script. Conditional mount target keeps the blast radius to this one config. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The image ENV pins CUDA_VISIBLE_DEVICES=4,5,6,7 (leftover from lmsys's internal testing). With --no-container-entrypoint it isn't cleared, so the container only sees 4 GPUs and TP=8 fails with torch.AcceleratorError: CUDA error: invalid device ordinal Unset it at the top of the script so Slurm's 8-GPU allocation is visible. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Only patched launch_b200-dgxc-slurm.sh last time; the b200-nb runner still had the default $GITHUB_WORKSPACE:/workspace/ mount, which masks the deepseek-v4-blackwell image's /workspace/sglang editable install. Most B200 jobs in this repo run on b200-nb. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ding Only replace the sglang launch command, keep all surrounding logic intact. Add PYTHONNOUSERSITE=1, SGLANG_OPT_USE_CUSTOM_ALL_REDUCE_V2=1, SGLANG_OPT_USE_TOPK_V2=1 env prefixes. Switch to sglang serve with EAGLE speculative decoding (3 steps, topk=1, 4 draft tokens), chunked prefill 4096, and disable-flashinfer-autotune. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace single-node TP8 flashinfer_mxfp4 recipe with TP8/DP8 dp-attention + DeepEP MoE backend. EAGLE spec decoding reduced to 1 step / 2 draft tokens. Adds mega-MOE optimizations and DeepEP dispatch/combine config. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

EAGLE speculative decoding is enabled in the benchmark script, so the YAML search-space entries need spec-decoding: "mtp" to ensure correct classification in config generation and eval selection. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

yhyang201 · 2026-04-24T14:38:43Z

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b200-sglang

github-actions · 2026-04-24T14:38:55Z

@yhyang201 Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24895270642
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b200-sglang
Pinned ref: 94272e1
Approval: required in environment 'Outside Collaborator E2E Test'.

Copy of dsv4_fp4_b200.sh with --use-chat-template added to run_benchmark_serving, as required by AGENTS.md for MTP scripts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

yhyang201 · 2026-04-24T16:32:19Z

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b200-sglang

github-actions · 2026-04-24T16:32:28Z

@yhyang201 Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24900476610
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b200-sglang
Pinned ref: dae8126
Approval: required in environment 'Outside Collaborator E2E Test'.

cquil11 · 2026-04-26T21:08:34Z

closing in favor of #1187

cquil11 and others added 13 commits April 24, 2026 01:10

Drop --container-name arg from launch_b200-nb.sh

d538a4a

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

yhyang201 requested a review from a team April 24, 2026 14:12

yhyang201 requested review from jgangani and kedarpotdar-nv as code owners April 24, 2026 14:12

github-project-automation Bot added this to InferenceMAX Board Apr 24, 2026

claude Bot reviewed Apr 24, 2026

View reviewed changes

Add dsv4_fp4_b200_mtp.sh for spec-decoding benchmarks

dae8126

Copy of dsv4_fp4_b200.sh with --use-chat-template added to run_benchmark_serving, as required by AGENTS.md for MTP scripts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

cquil11 closed this Apr 26, 2026

github-project-automation Bot moved this to Done in InferenceMAX Board Apr 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch dsv4 B200 SGLang to DeepEP dp-attention recipe#1140

Switch dsv4 B200 SGLang to DeepEP dp-attention recipe#1140
yhyang201 wants to merge 15 commits intoSemiAnalysisAI:mainfrom
yhyang201:chore/dsv4-sgl-b200-deepep

yhyang201 commented Apr 24, 2026

Uh oh!

claude Bot left a comment

Uh oh!

yhyang201 commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

yhyang201 commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

cquil11 commented Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yhyang201 commented Apr 24, 2026

Summary

Test plan

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

yhyang201 commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

yhyang201 commented Apr 24, 2026

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

cquil11 commented Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants