fix sgl b200/b300 dpsk-v4 script by Qiaolin-Yu · Pull Request #1158 · SemiAnalysisAI/InferenceX

Qiaolin-Yu · 2026-04-25T22:58:07Z

No description provided.

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

Fridge003 · 2026-04-25T23:16:59Z

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b200-sglang

Fridge003 · 2026-04-25T23:17:07Z

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang

github-actions · 2026-04-25T23:17:09Z

@Fridge003 Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24943006152
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b200-sglang
Pinned ref: 3141900
Approval: not required (trusted collaborator).

github-actions · 2026-04-25T23:17:17Z

@Fridge003 Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24943008546
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang
Pinned ref: 3141900
Approval: not required (trusted collaborator).

Qiaolin-Yu · 2026-04-26T01:53:05Z

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b200-sglang

Qiaolin-Yu · 2026-04-26T01:53:13Z

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang

github-actions · 2026-04-26T01:53:14Z

@Qiaolin-Yu Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24945687144
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b200-sglang
Pinned ref: 485cb8e
Approval: not required (trusted collaborator).

github-actions · 2026-04-26T01:53:21Z

@Qiaolin-Yu Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24945689122
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang
Pinned ref: 485cb8e
Approval: not required (trusted collaborator).

Qiaolin-Yu · 2026-04-26T04:00:15Z

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang

github-actions · 2026-04-26T04:00:26Z

@Qiaolin-Yu Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24947777157
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang
Pinned ref: 6f9ab84
Approval: not required (trusted collaborator).

Qiaolin-Yu · 2026-04-26T04:41:12Z

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang

github-actions · 2026-04-26T04:41:20Z

@Qiaolin-Yu Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24948424840
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang
Pinned ref: 7a4d415
Approval: not required (trusted collaborator).

Qiaolin-Yu · 2026-04-26T05:04:37Z

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang

github-actions · 2026-04-26T05:04:47Z

@Qiaolin-Yu Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24948791360
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang
Pinned ref: 0f975c8
Approval: not required (trusted collaborator).

Qiaolin-Yu · 2026-04-26T09:12:50Z

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang

github-actions · 2026-04-26T09:12:59Z

@Qiaolin-Yu Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24953046586
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang
Pinned ref: 7273032
Approval: not required (trusted collaborator).

* sglang dsv4 mtp * knob-driven recipe selection * self-contained mtp config; recipe via dp-attn * add mtp_1 (1/1/2) variant * knob-driven recipe selection * pin sglang image to mega_moe-capable digest * drop mtp_1 knob; align with PR #1158 image digest * update nvidia-master.yaml * fix: restore trailing newline in perf-changelog.yaml * fix: remove --use-chat-template and floor --max-running-requests at 8 The tokenizer for DSv4-Pro has no chat_template set, so --use-chat-template causes benchmark_serving.py to crash with ValueError. Remove it to align with dsv4_fp4_b300_sglang.sh. Also add a floor of 8 to --max-running-requests to match the base script and avoid too-low values at low concurrency. * perf-changelog: add dsv4-fp4-b300-sglang-mtp entry Rebase perf-changelog.yaml on latest main (preserving #1173 and #1174 entries) and append the MTP config entry for PR #1166. * dsv4-b300-sglang-mtp: tune EAGLE spec params from (3,1,4) to (4,1,5) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Revert "dsv4-b300-sglang-mtp: tune EAGLE spec params from (3,1,4) to (4,1,5)" This reverts the EAGLE spec params back to (3, 1, 4): --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Bryan Shan <58582368+Oseltamivir@users.noreply.github.com> Co-authored-by: Yuhao Yang <47235274+yhyang201@users.noreply.github.com> Co-authored-by: yhyang201 <yhyang201@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* dsv4-fp4-b300-sglang: revert to #1143 low-latency-only baseline Reverts the matrix expansion (#1132), script edits (#1158, #1173, #1174), and changelog retriggers (#1178) on top of the original #1143 entry. Restores the script and config block to their #1143 state and clears all prior dsv4-fp4-b300-sglang changelog entries to start fresh. The dsv4-fp4-b300-sglang-mtp config (#1166) is untouched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * perf-changelog: add pr-link for #1184 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * perf-changelog: keep only the original #1143 entry, drop new entry Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Mirrors the b300 revert in #1184. Restores benchmarks/single_node/ dsv4_fp4_b200.sh and the dsv4-fp4-b200-sglang block in nvidia-master.yaml to their pre-#1158 state (= post-#1131 baseline) — un-pins the image digest and restores conc-start=4 in the low-latency rows. No perf-changelog edit needed; #1158 did not add a b200 changelog entry. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…1158) Mirror of #1185 for the b200 side. Re-applies the b200-specific changes from #1158 on top of the #1186 baseline. - Image pinned to lmsysorg/sglang:deepseek-v4-blackwell@sha256:df18bfc4... - Adds DP_ATTENTION env knob and SGLANG_OPT_* perf env vars - Search space gets conc-start=1 in low-latency rows (was 4) - Recipe-per-CONC dispatch in script: low-latency / balanced / max-throughput selected by DP_ATTENTION + CONC Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@Qiaolin-Yu

…C split + DP-attn SWA tweak (#1185) * dsv4-fp4-b300-sglang: recipe-per-CONC split + DP-attn SWA tweak Squashes the cumulative changes from #1158 and #1174 into a single commit on top of the #1184 baseline. Excludes the iterative --max-running-requests floor from #1173. - Image pinned to lmsysorg/sglang:deepseek-v4-b300@sha256:26e116bd... - Search space: TP8/EP1 conc=1, TP4/EP1 conc=32, TP4/EP4 dp-attn conc=512 for both 1k1k and 8k1k - Script dispatches on DP_ATTENTION knob: TP-only (flashinfer_mxfp4) vs DP-attn (deepep + prefill-delayer + mega_moe env vars) - DP-attn path enables SGLANG_OPT_SWA_EVICT_DROP_PAGE_MARGIN=1 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * perf-changelog: add pr-link for #1185 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Apply suggestion from @Qiaolin-Yu --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Qiaolin Yu <liin1211@outlook.com>

…C split (#1187) * dsv4-fp4-b200-sglang: recipe-per-CONC dispatch (re-apply b200 part of #1158) Mirror of #1185 for the b200 side. Re-applies the b200-specific changes from #1158 on top of the #1186 baseline. - Image pinned to lmsysorg/sglang:deepseek-v4-blackwell@sha256:df18bfc4... - Adds DP_ATTENTION env knob and SGLANG_OPT_* perf env vars - Search space gets conc-start=1 in low-latency rows (was 4) - Recipe-per-CONC dispatch in script: low-latency / balanced / max-throughput selected by DP_ATTENTION + CONC Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * perf-changelog: add pr-link for #1187 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * dsv4-fp4-b200-sglang: restore --disable-radix-cache flag The flag was accidentally dropped during the recipe-per-CONC rewrite. Restoring it to match the baseline methodology (prefix caching disabled) and stay consistent with all other dsv4 sister scripts. Co-authored-by: Cameron Quilici <cquil11@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: update changelog and YAML comments to match two-way DP_ATTENTION dispatch The script has two branches (DP_ATTENTION true/false), not three CONC-keyed recipes. Both balanced and max-throughput rows use the same DP-attention + DeepEP flags — only --max-running-requests differs. Updated the nvidia-master.yaml comment block and perf-changelog description to accurately reflect this two-recipe dispatch. Co-authored-by: Cameron Quilici <cquil11@users.noreply.github.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: Cameron Quilici <cquil11@users.noreply.github.com>

fix sgl b200/b300 script

3a2e459

github-project-automation Bot added this to InferenceMAX Board Apr 25, 2026

Qiaolin-Yu requested a review from a team April 25, 2026 22:58

claude Bot reviewed Apr 25, 2026

View reviewed changes

ping docker image

3141900

Qiaolin-Yu requested review from jgangani and kedarpotdar-nv as code owners April 25, 2026 23:12

Qiaolin-Yu changed the title ~~fix sgl b200/b300 dpsk-v4 script~~ [wip] fix sgl b200/b300 dpsk-v4 script Apr 26, 2026

Qiaolin-Yu added 3 commits April 25, 2026 17:50

fix

ef41995

fix

7d107e8

fix

485cb8e

Qiaolin-Yu force-pushed the main branch from 6f9ab84 to 485cb8e Compare April 26, 2026 04:34

Qiaolin-Yu added 2 commits April 25, 2026 21:35

Merge branch 'main' into main

289a855

tune swa-full-tokens-ratio

7a4d415

Qiaolin-Yu changed the title ~~[wip] fix sgl b200/b300 dpsk-v4 script~~ fix sgl b200/b300 dpsk-v4 script Apr 26, 2026

Qiaolin-Yu added the full-sweep-enabled label Apr 26, 2026

high concurrency

0f975c8

hnyls2002 added a commit to Qiaolin-Yu/InferenceX that referenced this pull request Apr 26, 2026

drop mtp_1 knob; align with PR SemiAnalysisAI#1158 image digest

47fefec

change perflog

68e40f8

hnyls2002 approved these changes Apr 26, 2026

View reviewed changes

Qiaolin-Yu added 4 commits April 25, 2026 22:53

Merge branch 'main' into main

c09b03f

upd

e8e4810

fix hang

cbde82d

remove useless points

7273032

Qiaolin-Yu merged commit 8a174e0 into SemiAnalysisAI:main Apr 26, 2026
13 of 14 checks passed

github-project-automation Bot moved this to Done in InferenceMAX Board Apr 26, 2026

claude Bot mentioned this pull request Apr 26, 2026

retry sglang b300 #1171

Merged

cquil11 mentioned this pull request Apr 26, 2026

dsv4-fp4-b300-sglang: revert to #1143 low-latency-only baseline #1184

Merged

2 tasks

This was referenced Apr 26, 2026

[co-authored with sglang community maintainers leads at radixark] [NVIDIA][SGLang][redo PR] B300 DeepSeek v4 FP4 SGLang: recipe-per-CONC split + DP-attn SWA tweak #1185

Merged

dsv4-fp4-b200-sglang: revert b200 portion of #1158 #1186

Merged

cquil11 mentioned this pull request Apr 26, 2026

[NVIDIA][SGLang][redo PR] B200 DeepSeek v4 FP4 SGLang: recipe-per-CONC split #1187

Merged

2 tasks

Conversation

Qiaolin-Yu commented Apr 25, 2026

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

Fridge003 commented Apr 25, 2026

Uh oh!

Fridge003 commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

Qiaolin-Yu commented Apr 26, 2026

Uh oh!

Qiaolin-Yu commented Apr 26, 2026

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

Qiaolin-Yu commented Apr 26, 2026

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

Qiaolin-Yu commented Apr 26, 2026

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

Qiaolin-Yu commented Apr 26, 2026

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

Qiaolin-Yu commented Apr 26, 2026

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants