Skip to content

fix sgl b200/b300 dpsk-v4 script#1158

Merged
Qiaolin-Yu merged 13 commits intoSemiAnalysisAI:mainfrom
Qiaolin-Yu:main
Apr 26, 2026
Merged

fix sgl b200/b300 dpsk-v4 script#1158
Qiaolin-Yu merged 13 commits intoSemiAnalysisAI:mainfrom
Qiaolin-Yu:main

Conversation

@Qiaolin-Yu
Copy link
Copy Markdown
Collaborator

No description provided.

Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@Fridge003
Copy link
Copy Markdown
Collaborator

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b200-sglang

@Fridge003
Copy link
Copy Markdown
Collaborator

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang

@github-actions
Copy link
Copy Markdown
Contributor

@Fridge003 Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24943006152
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b200-sglang
Pinned ref: 3141900
Approval: not required (trusted collaborator).

@github-actions
Copy link
Copy Markdown
Contributor

@Fridge003 Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24943008546
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang
Pinned ref: 3141900
Approval: not required (trusted collaborator).

@Qiaolin-Yu Qiaolin-Yu changed the title fix sgl b200/b300 dpsk-v4 script [wip] fix sgl b200/b300 dpsk-v4 script Apr 26, 2026
@Qiaolin-Yu
Copy link
Copy Markdown
Collaborator Author

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b200-sglang

@Qiaolin-Yu
Copy link
Copy Markdown
Collaborator Author

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang

@github-actions
Copy link
Copy Markdown
Contributor

@Qiaolin-Yu Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24945687144
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b200-sglang
Pinned ref: 485cb8e
Approval: not required (trusted collaborator).

@github-actions
Copy link
Copy Markdown
Contributor

@Qiaolin-Yu Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24945689122
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang
Pinned ref: 485cb8e
Approval: not required (trusted collaborator).

@Qiaolin-Yu
Copy link
Copy Markdown
Collaborator Author

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang

@github-actions
Copy link
Copy Markdown
Contributor

@Qiaolin-Yu Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24947777157
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang
Pinned ref: 6f9ab84
Approval: not required (trusted collaborator).

@Qiaolin-Yu Qiaolin-Yu changed the title [wip] fix sgl b200/b300 dpsk-v4 script fix sgl b200/b300 dpsk-v4 script Apr 26, 2026
@Qiaolin-Yu
Copy link
Copy Markdown
Collaborator Author

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang

@github-actions
Copy link
Copy Markdown
Contributor

@Qiaolin-Yu Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24948424840
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang
Pinned ref: 7a4d415
Approval: not required (trusted collaborator).

@Qiaolin-Yu
Copy link
Copy Markdown
Collaborator Author

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang

@github-actions
Copy link
Copy Markdown
Contributor

@Qiaolin-Yu Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24948791360
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang
Pinned ref: 0f975c8
Approval: not required (trusted collaborator).

hnyls2002 added a commit to Qiaolin-Yu/InferenceX that referenced this pull request Apr 26, 2026
@Qiaolin-Yu
Copy link
Copy Markdown
Collaborator Author

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang

@github-actions
Copy link
Copy Markdown
Contributor

@Qiaolin-Yu Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24953046586
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang
Pinned ref: 7273032
Approval: not required (trusted collaborator).

@Qiaolin-Yu Qiaolin-Yu merged commit 8a174e0 into SemiAnalysisAI:main Apr 26, 2026
13 of 14 checks passed
@claude claude Bot mentioned this pull request Apr 26, 2026
Qiaolin-Yu pushed a commit that referenced this pull request Apr 26, 2026
* sglang dsv4 mtp

* knob-driven recipe selection

* self-contained mtp config; recipe via dp-attn

* add mtp_1 (1/1/2) variant

* knob-driven recipe selection

* pin sglang image to mega_moe-capable digest

* drop mtp_1 knob; align with PR #1158 image digest

* update nvidia-master.yaml

* fix: restore trailing newline in perf-changelog.yaml

* fix: remove --use-chat-template and floor --max-running-requests at 8

The tokenizer for DSv4-Pro has no chat_template set, so
--use-chat-template causes benchmark_serving.py to crash with
ValueError. Remove it to align with dsv4_fp4_b300_sglang.sh.

Also add a floor of 8 to --max-running-requests to match the
base script and avoid too-low values at low concurrency.

* perf-changelog: add dsv4-fp4-b300-sglang-mtp entry

Rebase perf-changelog.yaml on latest main (preserving #1173 and #1174
entries) and append the MTP config entry for PR #1166.

* dsv4-b300-sglang-mtp: tune EAGLE spec params from (3,1,4) to (4,1,5)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Revert "dsv4-b300-sglang-mtp: tune EAGLE spec params from (3,1,4) to (4,1,5)"

This reverts the EAGLE spec params back to (3, 1, 4):
  --speculative-num-steps 3
  --speculative-eagle-topk 1
  --speculative-num-draft-tokens 4

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Bryan Shan <58582368+Oseltamivir@users.noreply.github.com>
Co-authored-by: Yuhao Yang <47235274+yhyang201@users.noreply.github.com>
Co-authored-by: yhyang201 <yhyang201@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
cquil11 added a commit that referenced this pull request Apr 26, 2026
* dsv4-fp4-b300-sglang: revert to #1143 low-latency-only baseline

Reverts the matrix expansion (#1132), script edits (#1158, #1173, #1174),
and changelog retriggers (#1178) on top of the original #1143 entry.
Restores the script and config block to their #1143 state and clears
all prior dsv4-fp4-b300-sglang changelog entries to start fresh.

The dsv4-fp4-b300-sglang-mtp config (#1166) is untouched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* perf-changelog: add pr-link for #1184

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* perf-changelog: keep only the original #1143 entry, drop new entry

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cquil11 added a commit that referenced this pull request Apr 26, 2026
Mirrors the b300 revert in #1184. Restores benchmarks/single_node/
dsv4_fp4_b200.sh and the dsv4-fp4-b200-sglang block in nvidia-master.yaml
to their pre-#1158 state (= post-#1131 baseline) — un-pins the image
digest and restores conc-start=4 in the low-latency rows.

No perf-changelog edit needed; #1158 did not add a b200 changelog entry.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cquil11 added a commit that referenced this pull request Apr 26, 2026
…1158)

Mirror of #1185 for the b200 side. Re-applies the b200-specific
changes from #1158 on top of the #1186 baseline.

- Image pinned to lmsysorg/sglang:deepseek-v4-blackwell@sha256:df18bfc4...
- Adds DP_ATTENTION env knob and SGLANG_OPT_* perf env vars
- Search space gets conc-start=1 in low-latency rows (was 4)
- Recipe-per-CONC dispatch in script: low-latency / balanced /
  max-throughput selected by DP_ATTENTION + CONC

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cquil11 added a commit that referenced this pull request Apr 26, 2026
…C split + DP-attn SWA tweak (#1185)

* dsv4-fp4-b300-sglang: recipe-per-CONC split + DP-attn SWA tweak

Squashes the cumulative changes from #1158 and #1174 into a single
commit on top of the #1184 baseline. Excludes the iterative
--max-running-requests floor from #1173.

- Image pinned to lmsysorg/sglang:deepseek-v4-b300@sha256:26e116bd...
- Search space: TP8/EP1 conc=1, TP4/EP1 conc=32, TP4/EP4 dp-attn
  conc=512 for both 1k1k and 8k1k
- Script dispatches on DP_ATTENTION knob: TP-only (flashinfer_mxfp4)
  vs DP-attn (deepep + prefill-delayer + mega_moe env vars)
- DP-attn path enables SGLANG_OPT_SWA_EVICT_DROP_PAGE_MARGIN=1

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* perf-changelog: add pr-link for #1185

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Apply suggestion from @Qiaolin-Yu

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Qiaolin Yu <liin1211@outlook.com>
cquil11 added a commit that referenced this pull request Apr 26, 2026
…C split (#1187)

* dsv4-fp4-b200-sglang: recipe-per-CONC dispatch (re-apply b200 part of #1158)

Mirror of #1185 for the b200 side. Re-applies the b200-specific
changes from #1158 on top of the #1186 baseline.

- Image pinned to lmsysorg/sglang:deepseek-v4-blackwell@sha256:df18bfc4...
- Adds DP_ATTENTION env knob and SGLANG_OPT_* perf env vars
- Search space gets conc-start=1 in low-latency rows (was 4)
- Recipe-per-CONC dispatch in script: low-latency / balanced /
  max-throughput selected by DP_ATTENTION + CONC

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* perf-changelog: add pr-link for #1187

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* dsv4-fp4-b200-sglang: restore --disable-radix-cache flag

The flag was accidentally dropped during the recipe-per-CONC rewrite.
Restoring it to match the baseline methodology (prefix caching disabled)
and stay consistent with all other dsv4 sister scripts.

Co-authored-by: Cameron Quilici <cquil11@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: update changelog and YAML comments to match two-way DP_ATTENTION dispatch

The script has two branches (DP_ATTENTION true/false), not three CONC-keyed
recipes. Both balanced and max-throughput rows use the same DP-attention +
DeepEP flags — only --max-running-requests differs. Updated the
nvidia-master.yaml comment block and perf-changelog description to accurately
reflect this two-recipe dispatch.

Co-authored-by: Cameron Quilici <cquil11@users.noreply.github.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Co-authored-by: Cameron Quilici <cquil11@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

3 participants