[NV] Add deepseek-v4-pro b300 vllm config by Ankur-singh · Pull Request #1144 · SemiAnalysisAI/InferenceX

Ankur-singh · 2026-04-24T22:09:04Z

Add new B300 vLLM config dsv4-fp4-b300-vllm for DeepSeek-V4-Pro single-node benchmarks.

github-actions · 2026-04-24T22:09:12Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

kedarpotdar-nv · 2026-04-24T22:09:49Z

@claude please add description

Klaud-Cold · 2026-04-24T22:10:17Z

Claude finished @kedarpotdar-nv's task in 52s —— View job

Adding PR Description

Analyze PR changes
Draft and update PR description

PR description has been added. It includes:

A summary of the new B300 vLLM config for DeepSeek-V4-Pro
A comparison table showing key differences vs the existing H200 config
The full recipe flags and search space details
Link to the companion H200 PR (Add H200 config: dsv4-fp8-vllm (DeepSeek-V4-Pro) #1130)
A test plan checklist

claude · 2026-04-24T22:19:05Z

+    search-space:
+    - { tp: 8, conc-start: 4, conc-end: 4 }
+    - { tp: 4, conc-start: 4, conc-end: 128 }
+    - { tp: 8, conc-start: 128, conc-end: 128 }
+    - { tp: 4, dp-attn: true, conc-start: 256, conc-end: 512 }
+  - isl: 8192
+    osl: 1024
+    search-space:
+    - { tp: 8, conc-start: 4, conc-end: 4 }
+    - { tp: 4, conc-start: 4, conc-end: 128 }
+    - { tp: 8, conc-start: 128, conc-end: 128 }
+    - { tp: 4, dp-attn: true, conc-start: 256, conc-end: 512 }


🔴 All 4 search-space entries for dsv4-fp8-b300-vllm (nvidia-master.yaml:2402-2413) omit the ep field, so generate_sweep_configs.py defaults each matrix entry to ep=1. But benchmarks/single_node/dsv4_fp8_b300.sh always passes --enable-expert-parallel, meaning the actual EP is 8 (for tp:8), 4 (for tp:4), or 4 (for tp:4/dp-attn:true) — never 1. Downstream metadata (RESULT_FILENAME, process_result.py, compare_results.py/summarize.py grouping keys) will therefore record ep=1 for every data point. Fix by adding ep: 8 to the two tp:8 entries and ep: 4 to the two tp:4 entries, mirroring the adjacent dsv4-fp8-h200-vllm config and PR #919's metadata cleanup.

Extended reasoning...

What the bug is. The newly added dsv4-fp8-b300-vllm block (.github/configs/nvidia-master.yaml:2388-2413) declares four search-space entries across its two seq-len configs and none of them set the ep field: {tp:8,...}, {tp:4,...}, {tp:8,...}, {tp:4,dp-attn:true,...}. In contrast, the sibling dsv4-fp8-h200-vllm at line 2385 correctly specifies ep: 8, which is the established convention for MoE configs in this file.

Why the default is wrong for this recipe. utils/matrix_logic/generate_sweep_configs.py:354 initializes Fields.EP.value to 1 for single-node entries and only overrides it (lines 362-363) when ep is explicitly present in the YAML entry. So every generated matrix row for this config gets ep=1. However, benchmarks/single_node/dsv4_fp8_b300.sh unconditionally passes --enable-expert-parallel on the vllm serve command (line ~76 of the new script), independent of TP or DP_ATTENTION. With vLLM's expert-parallel semantics, the effective expert-parallel degree equals the world size (TP × DP), so the runtime EP is 8 or 4, never 1.

How the metadata mismatch propagates. The EP value from the matrix becomes EP_SIZE via .github/workflows/benchmark-tmpl.yml:85, and that value is then (a) embedded in RESULT_FILENAME at line 146 as ep${EP_SIZE}, (b) written into the aggregated JSON by utils/process_result.py:100-108 as data['ep'] = ep_size, (c) used as a grouping key in utils/summarize.py:82,104, and (d) forms the tp{tp}/ep{ep} lookup key in utils/compare_results.py:244. So every single B300 result file for this PR will be named ...ep1... and every aggregated data point will claim ep: 1, while the actual run executed with EP=4 or EP=8. Any downstream baseline comparison or eval grouping will key on a value that doesn't exist in the launched recipe.

Step-by-step proof for the second entry (tp:4, conc 4-128 on 1k1k).

YAML entry: { tp: 4, conc-start: 4, conc-end: 128 } — no ep key.

generate_sweep_configs.py:354 seeds the row with ep: 1 (default) and the tp override sets tp: 4; line 362-363 does not run because 'ep' is not in the dict.

Matrix row is emitted with tp=4, ep=1, dp-attn=false.

benchmark-tmpl.yml:85 exports EP_SIZE=1; line 146 stamps the result file as ..._tp4-ep1-dpaFalse_....

The launch script enters the else-branch (DP_ATTENTION != true), so PARALLEL_ARGS=--tensor-parallel-size 4 --data-parallel-size 1, and --enable-expert-parallel is always present → vLLM runs with TP=4, DP=1, EP enabled over world size 4 → effective EP=4.

process_result.py reads EP_SIZE=1 from env and writes {'ep': 1, ...} to the JSON — the ep field recorded is 1, the actual EP used was 4.

Why this was not caught earlier. There is no validation that cross-references --enable-expert-parallel in a launch script against the ep field in matrix entries; the coupling is by convention. This is precisely the class of mismatch that PR #919 ('Fix metadata inconsistencies in nvidia-master.yaml - TP/EP/DP-attn values now match actual recipe files') was created to clean up, and that the gptoss-fp4-* and dsr1-fp4-* changelogs repeatedly reference ('Explicitly add EP=TP for DP attention configs', 'Set ep:4 for all tp:4 entries, ep:8 for all tp:8 entries').

Fix. Add explicit ep to each B300 search-space entry to match the launched EP:

{ tp: 8, ep: 8, conc-start: 4, conc-end: 4 }

{ tp: 4, ep: 4, conc-start: 4, conc-end: 128 }

{ tp: 8, ep: 8, conc-start: 128, conc-end: 128 }

{ tp: 4, ep: 4, dp-attn: true, conc-start: 256, conc-end: 512 }

This mirrors the adjacent dsv4-fp8-h200-vllm convention (ep: 8 for tp: 8, dp-attn: true) and keeps RESULT_FILENAME/process_result.py/compare_results.py in sync with the actual runtime EP. Purely metadata-only — no recipe-file changes required.

fix precision

Use _${FRAMEWORK} suffix for dsv4 model on B300 so vllm and sglang configs resolve to distinct benchmark scripts. Other models keep the existing _trt-only logic. Rename dsv4_fp4_b300.sh to dsv4_fp4_b300_sglang.sh to match the new convention.

# Conflicts: # benchmarks/single_node/dsv4_fp4_b300_vllm.sh # perf-changelog.yaml # runners/launch_b300-nv.sh

functionstackx

lgtm, resolve conflicts and merge tonight plz

great work @Ankur-singh

Ankur-singh requested a review from a team April 24, 2026 22:09

github-project-automation Bot added this to InferenceMAX Board Apr 24, 2026

Ankur-singh requested review from jgangani and kedarpotdar-nv as code owners April 24, 2026 22:09

Ankur-singh added the sweep-enabled label Apr 24, 2026

claude Bot reviewed Apr 24, 2026

View reviewed changes

Ankur-singh force-pushed the nv/b300-vllm-config branch from ce343aa to 0e86648 Compare April 24, 2026 22:19

Ankur-singh added 5 commits April 24, 2026 15:41

feat: add deepseek-v4-pro b300 vllm benchmark

74e99f1

Update PR link in perf-changelog.yaml

3eebd56

Change precision from fp8 to fp4 in nvidia-master.yaml

cdef7c9

fix precision

Bump deepseekv4 vLLM image to cu130

59eef30

Ankur-singh force-pushed the nv/b300-vllm-config branch from 0e86648 to 6a0fa73 Compare April 24, 2026 22:56

Ankur-singh added full-sweep-enabled and removed sweep-enabled labels Apr 24, 2026

kedarpotdar-nv added the NVIDIA label Apr 25, 2026

kedarpotdar-nv approved these changes Apr 25, 2026

View reviewed changes

Merge remote-tracking branch 'origin/main' into nv/b300-vllm-config

d38290c

# Conflicts: # benchmarks/single_node/dsv4_fp4_b300_vllm.sh # perf-changelog.yaml # runners/launch_b300-nv.sh

functionstackx mentioned this pull request Apr 25, 2026

Add B300 config: dsv4-fp4fp8-vllm (DeepSeek-V4-Pro) #1128

Closed

4 tasks

functionstackx approved these changes Apr 25, 2026

View reviewed changes

Ankur-singh merged commit 9da017e into main Apr 25, 2026
8 of 32 checks passed

Ankur-singh deleted the nv/b300-vllm-config branch April 25, 2026 02:43

github-project-automation Bot moved this to Done in InferenceMAX Board Apr 25, 2026

This was referenced Apr 25, 2026

vLLM GB300 Day 0 DSV4 FP4 disagg #1150

Open

Update dsv4 b300 configs #1155

Merged

Add DSv4 B200 configs #1156

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NV] Add deepseek-v4-pro b300 vllm config#1144

[NV] Add deepseek-v4-pro b300 vllm config#1144
Ankur-singh merged 6 commits intomainfrom
nv/b300-vllm-config

Ankur-singh commented Apr 24, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

kedarpotdar-nv commented Apr 24, 2026

Uh oh!

Klaud-Cold commented Apr 24, 2026 •

edited

Loading

Uh oh!

claude Bot Apr 24, 2026

Uh oh!

functionstackx left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Ankur-singh commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Apr 24, 2026

Uh oh!

kedarpotdar-nv commented Apr 24, 2026

Uh oh!

Klaud-Cold commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Adding PR Description

Uh oh!

claude Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

functionstackx left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Ankur-singh commented Apr 24, 2026 •

edited

Loading

Klaud-Cold commented Apr 24, 2026 •

edited

Loading