Skip to content

dsv4-fp4-b200-sglang: revert b200 portion of #1158#1186

Merged
cquil11 merged 1 commit intomainfrom
chore/revert-dsv4-fp4-b200-sglang-from-1158
Apr 26, 2026
Merged

dsv4-fp4-b200-sglang: revert b200 portion of #1158#1186
cquil11 merged 1 commit intomainfrom
chore/revert-dsv4-fp4-b200-sglang-from-1158

Conversation

@cquil11
Copy link
Copy Markdown
Collaborator

@cquil11 cquil11 commented Apr 26, 2026

Summary

Mirror of #1184 for the b200 side. Reverts the b200-specific changes from #1158 to their pre-#1158 baseline (= post-#1131 state).

What's reverted

  • benchmarks/single_node/dsv4_fp4_b200.sh — restored to its post-[NVIDIA] chore: B200 single node DeepSeek v4 SGLang #1131 form. Drops the DP_ATTENTION env knob, the SGLANG_OPT_* env block, and the dual PARALLEL_ARGS branches; restores the original CONC-based recipe dispatch (low-latency / balanced / max-throughput selected by CONC inside the script).
  • dsv4-fp4-b200-sglang block in nvidia-master.yaml — un-pins image: lmsysorg/sglang:deepseek-v4-blackwell (drops the @sha256:df18bfc4... digest), and restores conc-start: 4 in the low-latency rows for both 1k1k and 8k1k (was conc-start: 1).

Not touched

Test plan

  • Sweep run on B200 against the restored matrix (low-latency tp=8 ep=1 conc 4-32, balanced tp=8 ep=8 dp-attn conc 64-128, max-throughput tp=8 ep=8 dp-attn conc 256-{512,1024}).

🤖 Generated with Claude Code

Mirrors the b300 revert in #1184. Restores benchmarks/single_node/
dsv4_fp4_b200.sh and the dsv4-fp4-b200-sglang block in nvidia-master.yaml
to their pre-#1158 state (= post-#1131 baseline) — un-pins the image
digest and restores conc-start=4 in the low-latency rows.

No perf-changelog edit needed; #1158 did not add a b200 changelog entry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

1 similar comment
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

@cquil11 cquil11 merged commit 1d0a9f0 into main Apr 26, 2026
8 checks passed
@cquil11 cquil11 deleted the chore/revert-dsv4-fp4-b200-sglang-from-1158 branch April 26, 2026 20:10
cquil11 added a commit that referenced this pull request Apr 26, 2026
…1158)

Mirror of #1185 for the b200 side. Re-applies the b200-specific
changes from #1158 on top of the #1186 baseline.

- Image pinned to lmsysorg/sglang:deepseek-v4-blackwell@sha256:df18bfc4...
- Adds DP_ATTENTION env knob and SGLANG_OPT_* perf env vars
- Search space gets conc-start=1 in low-latency rows (was 4)
- Recipe-per-CONC dispatch in script: low-latency / balanced /
  max-throughput selected by DP_ATTENTION + CONC

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mechanical benchmark-config revert mirroring #1184; restores the b200 script and nvidia-master.yaml entries to their post-#1131 baseline.

Extended reasoning...

Overview

This PR mirrors the already-merged #1184 for the b200 side. It reverts the b200-specific changes from #1158 by restoring benchmarks/single_node/dsv4_fp4_b200.sh to a CONC-based recipe dispatch (low-latency / balanced / max-throughput) and updating two entries in .github/configs/nvidia-master.yaml (un-pin the sglang image digest and restore conc-start: 4 in the low-latency rows).

Security risks

None — the changes are confined to a CI benchmark shell script and a YAML config file used only for benchmark sweeps. No auth/crypto/permissions code is touched.

Level of scrutiny

Low. This is benchmark/CI sweep infrastructure, not production runtime code. The change is a straightforward revert to a previously-known-good state, parallel in shape to #1184 which has already landed.

Other factors

The bug-hunting system found no bugs. The only oddity is the un-pinning of the sglang image digest, which gives up reproducibility — but that matches the pre-#1158 baseline behavior the PR explicitly aims to restore, and the b300 side made the same choice in #1184.

cquil11 added a commit that referenced this pull request Apr 26, 2026
…C split (#1187)

* dsv4-fp4-b200-sglang: recipe-per-CONC dispatch (re-apply b200 part of #1158)

Mirror of #1185 for the b200 side. Re-applies the b200-specific
changes from #1158 on top of the #1186 baseline.

- Image pinned to lmsysorg/sglang:deepseek-v4-blackwell@sha256:df18bfc4...
- Adds DP_ATTENTION env knob and SGLANG_OPT_* perf env vars
- Search space gets conc-start=1 in low-latency rows (was 4)
- Recipe-per-CONC dispatch in script: low-latency / balanced /
  max-throughput selected by DP_ATTENTION + CONC

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* perf-changelog: add pr-link for #1187

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* dsv4-fp4-b200-sglang: restore --disable-radix-cache flag

The flag was accidentally dropped during the recipe-per-CONC rewrite.
Restoring it to match the baseline methodology (prefix caching disabled)
and stay consistent with all other dsv4 sister scripts.

Co-authored-by: Cameron Quilici <cquil11@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: update changelog and YAML comments to match two-way DP_ATTENTION dispatch

The script has two branches (DP_ATTENTION true/false), not three CONC-keyed
recipes. Both balanced and max-throughput rows use the same DP-attention +
DeepEP flags — only --max-running-requests differs. Updated the
nvidia-master.yaml comment block and perf-changelog description to accurately
reflect this two-recipe dispatch.

Co-authored-by: Cameron Quilici <cquil11@users.noreply.github.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Cameron Quilici <cquil11@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

1 participant