Skip to content

dsv4-fp4 sglang b300: floor --max-running-requests at 8#1173

Merged
Qiaolin-Yu merged 3 commits intoSemiAnalysisAI:mainfrom
sglang-bot:dsv4-fp4-sglang-floor-max-running-requests
Apr 26, 2026
Merged

dsv4-fp4 sglang b300: floor --max-running-requests at 8#1173
Qiaolin-Yu merged 3 commits intoSemiAnalysisAI:mainfrom
sglang-bot:dsv4-fp4-sglang-floor-max-running-requests

Conversation

@sglang-bot
Copy link
Copy Markdown
Contributor

@sglang-bot sglang-bot commented Apr 26, 2026

Summary

Test plan

  • Run a low-CONC sglang benchmark (e.g. CONC=1) on b300 and confirm --max-running-requests is launched as 8.
  • Run a high-CONC sglang benchmark and confirm --max-running-requests is still CONC * 3 / 2.

🤖 Generated with Claude Code

Mirrors the floor-of-4 pattern from the mi355x atom script (SemiAnalysisAI#1170);
prevents tiny CONC values from yielding sub-optimal max-running-requests.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

sglang-bot and others added 2 commits April 26, 2026 03:35
…ts floor)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@sglang-bot sglang-bot changed the title dsv4-fp4 sglang b200/b300: floor --max-running-requests at 8 dsv4-fp4 sglang b300: floor --max-running-requests at 8 Apr 26, 2026
@Qiaolin-Yu Qiaolin-Yu merged commit 00fe30d into SemiAnalysisAI:main Apr 26, 2026
yhyang201 added a commit to Qiaolin-Yu/InferenceX that referenced this pull request Apr 26, 2026
Rebase perf-changelog.yaml on latest main (preserving SemiAnalysisAI#1173 and SemiAnalysisAI#1174
entries) and append the MTP config entry for PR SemiAnalysisAI#1166.
Qiaolin-Yu pushed a commit that referenced this pull request Apr 26, 2026
* sglang dsv4 mtp

* knob-driven recipe selection

* self-contained mtp config; recipe via dp-attn

* add mtp_1 (1/1/2) variant

* knob-driven recipe selection

* pin sglang image to mega_moe-capable digest

* drop mtp_1 knob; align with PR #1158 image digest

* update nvidia-master.yaml

* fix: restore trailing newline in perf-changelog.yaml

* fix: remove --use-chat-template and floor --max-running-requests at 8

The tokenizer for DSv4-Pro has no chat_template set, so
--use-chat-template causes benchmark_serving.py to crash with
ValueError. Remove it to align with dsv4_fp4_b300_sglang.sh.

Also add a floor of 8 to --max-running-requests to match the
base script and avoid too-low values at low concurrency.

* perf-changelog: add dsv4-fp4-b300-sglang-mtp entry

Rebase perf-changelog.yaml on latest main (preserving #1173 and #1174
entries) and append the MTP config entry for PR #1166.

* dsv4-b300-sglang-mtp: tune EAGLE spec params from (3,1,4) to (4,1,5)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Revert "dsv4-b300-sglang-mtp: tune EAGLE spec params from (3,1,4) to (4,1,5)"

This reverts the EAGLE spec params back to (3, 1, 4):
  --speculative-num-steps 3
  --speculative-eagle-topk 1
  --speculative-num-draft-tokens 4

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Bryan Shan <58582368+Oseltamivir@users.noreply.github.com>
Co-authored-by: Yuhao Yang <47235274+yhyang201@users.noreply.github.com>
Co-authored-by: yhyang201 <yhyang201@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
cquil11 added a commit that referenced this pull request Apr 26, 2026
* dsv4-fp4-b300-sglang: revert to #1143 low-latency-only baseline

Reverts the matrix expansion (#1132), script edits (#1158, #1173, #1174),
and changelog retriggers (#1178) on top of the original #1143 entry.
Restores the script and config block to their #1143 state and clears
all prior dsv4-fp4-b300-sglang changelog entries to start fresh.

The dsv4-fp4-b300-sglang-mtp config (#1166) is untouched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* perf-changelog: add pr-link for #1184

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* perf-changelog: keep only the original #1143 entry, drop new entry

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cquil11 added a commit that referenced this pull request Apr 26, 2026
…C split + DP-attn SWA tweak (#1185)

* dsv4-fp4-b300-sglang: recipe-per-CONC split + DP-attn SWA tweak

Squashes the cumulative changes from #1158 and #1174 into a single
commit on top of the #1184 baseline. Excludes the iterative
--max-running-requests floor from #1173.

- Image pinned to lmsysorg/sglang:deepseek-v4-b300@sha256:26e116bd...
- Search space: TP8/EP1 conc=1, TP4/EP1 conc=32, TP4/EP4 dp-attn
  conc=512 for both 1k1k and 8k1k
- Script dispatches on DP_ATTENTION knob: TP-only (flashinfer_mxfp4)
  vs DP-attn (deepep + prefill-delayer + mega_moe env vars)
- DP-attn path enables SGLANG_OPT_SWA_EVICT_DROP_PAGE_MARGIN=1

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* perf-changelog: add pr-link for #1185

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Apply suggestion from @Qiaolin-Yu

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Qiaolin Yu <liin1211@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

2 participants