Skip to content

[NV] Add deepseek-v4-pro b300 vllm config#1144

Merged
Ankur-singh merged 6 commits intomainfrom
nv/b300-vllm-config
Apr 25, 2026
Merged

[NV] Add deepseek-v4-pro b300 vllm config#1144
Ankur-singh merged 6 commits intomainfrom
nv/b300-vllm-config

Conversation

@Ankur-singh
Copy link
Copy Markdown
Collaborator

@Ankur-singh Ankur-singh commented Apr 24, 2026

Add new B300 vLLM config dsv4-fp4-b300-vllm for DeepSeek-V4-Pro single-node benchmarks.

@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

@kedarpotdar-nv
Copy link
Copy Markdown
Collaborator

@claude please add description

@Klaud-Cold
Copy link
Copy Markdown
Collaborator

Klaud-Cold commented Apr 24, 2026

Claude finished @kedarpotdar-nv's task in 52s —— View job


Adding PR Description

  • Analyze PR changes
  • Draft and update PR description

PR description has been added. It includes:

  • A summary of the new B300 vLLM config for DeepSeek-V4-Pro
  • A comparison table showing key differences vs the existing H200 config
  • The full recipe flags and search space details
  • Link to the companion H200 PR (Add H200 config: dsv4-fp8-vllm (DeepSeek-V4-Pro) #1130)
  • A test plan checklist

Comment on lines +2402 to +2413
search-space:
- { tp: 8, conc-start: 4, conc-end: 4 }
- { tp: 4, conc-start: 4, conc-end: 128 }
- { tp: 8, conc-start: 128, conc-end: 128 }
- { tp: 4, dp-attn: true, conc-start: 256, conc-end: 512 }
- isl: 8192
osl: 1024
search-space:
- { tp: 8, conc-start: 4, conc-end: 4 }
- { tp: 4, conc-start: 4, conc-end: 128 }
- { tp: 8, conc-start: 128, conc-end: 128 }
- { tp: 4, dp-attn: true, conc-start: 256, conc-end: 512 }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 All 4 search-space entries for dsv4-fp8-b300-vllm (nvidia-master.yaml:2402-2413) omit the ep field, so generate_sweep_configs.py defaults each matrix entry to ep=1. But benchmarks/single_node/dsv4_fp8_b300.sh always passes --enable-expert-parallel, meaning the actual EP is 8 (for tp:8), 4 (for tp:4), or 4 (for tp:4/dp-attn:true) — never 1. Downstream metadata (RESULT_FILENAME, process_result.py, compare_results.py/summarize.py grouping keys) will therefore record ep=1 for every data point. Fix by adding ep: 8 to the two tp:8 entries and ep: 4 to the two tp:4 entries, mirroring the adjacent dsv4-fp8-h200-vllm config and PR #919's metadata cleanup.

Extended reasoning...

What the bug is. The newly added dsv4-fp8-b300-vllm block (.github/configs/nvidia-master.yaml:2388-2413) declares four search-space entries across its two seq-len configs and none of them set the ep field: {tp:8,...}, {tp:4,...}, {tp:8,...}, {tp:4,dp-attn:true,...}. In contrast, the sibling dsv4-fp8-h200-vllm at line 2385 correctly specifies ep: 8, which is the established convention for MoE configs in this file.

Why the default is wrong for this recipe. utils/matrix_logic/generate_sweep_configs.py:354 initializes Fields.EP.value to 1 for single-node entries and only overrides it (lines 362-363) when ep is explicitly present in the YAML entry. So every generated matrix row for this config gets ep=1. However, benchmarks/single_node/dsv4_fp8_b300.sh unconditionally passes --enable-expert-parallel on the vllm serve command (line ~76 of the new script), independent of TP or DP_ATTENTION. With vLLM's expert-parallel semantics, the effective expert-parallel degree equals the world size (TP × DP), so the runtime EP is 8 or 4, never 1.

How the metadata mismatch propagates. The EP value from the matrix becomes EP_SIZE via .github/workflows/benchmark-tmpl.yml:85, and that value is then (a) embedded in RESULT_FILENAME at line 146 as ep${EP_SIZE}, (b) written into the aggregated JSON by utils/process_result.py:100-108 as data['ep'] = ep_size, (c) used as a grouping key in utils/summarize.py:82,104, and (d) forms the tp{tp}/ep{ep} lookup key in utils/compare_results.py:244. So every single B300 result file for this PR will be named ...ep1... and every aggregated data point will claim ep: 1, while the actual run executed with EP=4 or EP=8. Any downstream baseline comparison or eval grouping will key on a value that doesn't exist in the launched recipe.

Step-by-step proof for the second entry (tp:4, conc 4-128 on 1k1k).

  1. YAML entry: { tp: 4, conc-start: 4, conc-end: 128 } — no ep key.
  2. generate_sweep_configs.py:354 seeds the row with ep: 1 (default) and the tp override sets tp: 4; line 362-363 does not run because 'ep' is not in the dict.
  3. Matrix row is emitted with tp=4, ep=1, dp-attn=false.
  4. benchmark-tmpl.yml:85 exports EP_SIZE=1; line 146 stamps the result file as ..._tp4-ep1-dpaFalse_....
  5. The launch script enters the else-branch (DP_ATTENTION != true), so PARALLEL_ARGS=--tensor-parallel-size 4 --data-parallel-size 1, and --enable-expert-parallel is always present → vLLM runs with TP=4, DP=1, EP enabled over world size 4 → effective EP=4.
  6. process_result.py reads EP_SIZE=1 from env and writes {'ep': 1, ...} to the JSON — the ep field recorded is 1, the actual EP used was 4.

Why this was not caught earlier. There is no validation that cross-references --enable-expert-parallel in a launch script against the ep field in matrix entries; the coupling is by convention. This is precisely the class of mismatch that PR #919 ('Fix metadata inconsistencies in nvidia-master.yaml - TP/EP/DP-attn values now match actual recipe files') was created to clean up, and that the gptoss-fp4-* and dsr1-fp4-* changelogs repeatedly reference ('Explicitly add EP=TP for DP attention configs', 'Set ep:4 for all tp:4 entries, ep:8 for all tp:8 entries').

Fix. Add explicit ep to each B300 search-space entry to match the launched EP:

  • { tp: 8, ep: 8, conc-start: 4, conc-end: 4 }
  • { tp: 4, ep: 4, conc-start: 4, conc-end: 128 }
  • { tp: 8, ep: 8, conc-start: 128, conc-end: 128 }
  • { tp: 4, ep: 4, dp-attn: true, conc-start: 256, conc-end: 512 }

This mirrors the adjacent dsv4-fp8-h200-vllm convention (ep: 8 for tp: 8, dp-attn: true) and keeps RESULT_FILENAME/process_result.py/compare_results.py in sync with the actual runtime EP. Purely metadata-only — no recipe-file changes required.

@Ankur-singh Ankur-singh force-pushed the nv/b300-vllm-config branch from ce343aa to 0e86648 Compare April 24, 2026 22:19
Use _${FRAMEWORK} suffix for dsv4 model on B300 so vllm and sglang
configs resolve to distinct benchmark scripts. Other models keep the
existing _trt-only logic. Rename dsv4_fp4_b300.sh to dsv4_fp4_b300_sglang.sh
to match the new convention.
# Conflicts:
#	benchmarks/single_node/dsv4_fp4_b300_vllm.sh
#	perf-changelog.yaml
#	runners/launch_b300-nv.sh
Copy link
Copy Markdown
Contributor

@functionstackx functionstackx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, resolve conflicts and merge tonight plz

great work @Ankur-singh

@Ankur-singh Ankur-singh merged commit 9da017e into main Apr 25, 2026
8 of 32 checks passed
@Ankur-singh Ankur-singh deleted the nv/b300-vllm-config branch April 25, 2026 02:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

4 participants