sglang dsv4 MTP by hnyls2002 · Pull Request #1166 · SemiAnalysisAI/InferenceX

hnyls2002 · 2026-04-26T03:51:36Z

Add MTP recipe for DSv4 FP4 on B300 with SGLang.

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

Qiaolin-Yu · 2026-04-26T04:37:27Z

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang-mtp

github-actions · 2026-04-26T04:37:37Z

@Qiaolin-Yu Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24948369055
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang-mtp
Pinned ref: 1b34a8d
Approval: not required (trusted collaborator).

Qiaolin-Yu · 2026-04-26T05:05:46Z

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang-mtp

github-actions · 2026-04-26T05:05:55Z

@Qiaolin-Yu Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24948810957
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang-mtp
Pinned ref: 481482a
Approval: not required (trusted collaborator).

Qiaolin-Yu · 2026-04-26T05:27:55Z

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang-mtp

github-actions · 2026-04-26T05:28:05Z

@Qiaolin-Yu Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24949162299
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang-mtp
Pinned ref: 47fefec
Approval: not required (trusted collaborator).

Oseltamivir

lgtm

yhyang201 · 2026-04-26T09:54:09Z

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang-mtp

github-actions · 2026-04-26T09:54:19Z

@yhyang201 Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24953792156
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang-mtp
Pinned ref: f64505b
Approval: not required (trusted collaborator).

The tokenizer for DSv4-Pro has no chat_template set, so --use-chat-template causes benchmark_serving.py to crash with ValueError. Remove it to align with dsv4_fp4_b300_sglang.sh. Also add a floor of 8 to --max-running-requests to match the base script and avoid too-low values at low concurrency.

Rebase perf-changelog.yaml on latest main (preserving SemiAnalysisAI#1173 and SemiAnalysisAI#1174 entries) and append the MTP config entry for PR SemiAnalysisAI#1166.

yhyang201 · 2026-04-26T12:17:36Z

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang-mtp

github-actions · 2026-04-26T12:17:46Z

@yhyang201 Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24956420257
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang-mtp
Pinned ref: 4155a49
Approval: not required (trusted collaborator).

yhyang201 · 2026-04-26T13:15:08Z

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang-mtp

github-actions · 2026-04-26T13:15:16Z

@yhyang201 Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24957535378
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang-mtp
Pinned ref: 4155a49
Approval: not required (trusted collaborator).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

yhyang201 · 2026-04-26T13:32:37Z

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang-mtp

github-actions · 2026-04-26T13:32:45Z

@yhyang201 Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24957880039
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang-mtp
Pinned ref: c1d65ee
Approval: not required (trusted collaborator).

…(4,1,5)" This reverts the EAGLE spec params back to (3, 1, 4): --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* dsv4-fp4-b300-sglang: revert to #1143 low-latency-only baseline Reverts the matrix expansion (#1132), script edits (#1158, #1173, #1174), and changelog retriggers (#1178) on top of the original #1143 entry. Restores the script and config block to their #1143 state and clears all prior dsv4-fp4-b300-sglang changelog entries to start fresh. The dsv4-fp4-b300-sglang-mtp config (#1166) is untouched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * perf-changelog: add pr-link for #1184 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * perf-changelog: keep only the original #1143 entry, drop new entry Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

sglang dsv4 mtp

148223d

hnyls2002 requested a review from a team April 26, 2026 03:51

github-project-automation Bot added this to InferenceMAX Board Apr 26, 2026

claude Bot reviewed Apr 26, 2026

View reviewed changes

Oseltamivir added the sweep-enabled label Apr 26, 2026

hnyls2002 added 2 commits April 25, 2026 21:02

knob-driven recipe selection

c883e8d

self-contained mtp config; recipe via dp-attn

3a49ed1

hnyls2002 requested review from jgangani and kedarpotdar-nv as code owners April 26, 2026 04:09

hnyls2002 added 2 commits April 25, 2026 21:20

add mtp_1 (1/1/2) variant

6f1b80a

knob-driven recipe selection

1b34a8d

Fridge003 added the full-sweep-enabled label Apr 26, 2026

pin sglang image to mega_moe-capable digest

481482a

Oseltamivir removed the full-sweep-enabled label Apr 26, 2026

drop mtp_1 knob; align with PR SemiAnalysisAI#1158 image digest

47fefec

Oseltamivir removed the sweep-enabled label Apr 26, 2026

Oseltamivir approved these changes Apr 26, 2026

View reviewed changes

Merge branch 'main' into sglang-dsv4-MTP

bfa254d

Oseltamivir requested a review from Qiaolin-Yu as a code owner April 26, 2026 05:41

Qiaolin-Yu approved these changes Apr 26, 2026

View reviewed changes

yhyang201 and others added 3 commits April 26, 2026 17:37

update nvidia-master.yaml

287ef26

Merge branch 'main' into sglang-dsv4-MTP

e4ddf8f

fix: restore trailing newline in perf-changelog.yaml

f64505b

yhyang201 force-pushed the sglang-dsv4-MTP branch from fffb295 to f64505b Compare April 26, 2026 09:52

yhyang201 added 3 commits April 26, 2026 20:09

perf-changelog: add dsv4-fp4-b300-sglang-mtp entry

fc93e84

Rebase perf-changelog.yaml on latest main (preserving SemiAnalysisAI#1173 and SemiAnalysisAI#1174 entries) and append the MTP config entry for PR SemiAnalysisAI#1166.

merge main and resolve perf-changelog.yaml conflict

4155a49

claude Bot mentioned this pull request Apr 26, 2026

dsv4-b300-sglang: add conc=2048 recipe & MTP benchmark #1176

Closed

3 tasks

dsv4-b300-sglang-mtp: tune EAGLE spec params from (3,1,4) to (4,1,5)

c1d65ee

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

yhyang201 force-pushed the sglang-dsv4-MTP branch from e2ae658 to c1d65ee Compare April 26, 2026 14:53

Revert "dsv4-b300-sglang-mtp: tune EAGLE spec params from (3,1,4) to …

86973f3

…(4,1,5)" This reverts the EAGLE spec params back to (3, 1, 4): --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Qiaolin-Yu merged commit 009f11d into SemiAnalysisAI:main Apr 26, 2026

github-project-automation Bot moved this to Done in InferenceMAX Board Apr 26, 2026

This was referenced Apr 26, 2026

dsv4-fp4-b300-sglang-mtp: pass --dsv4 to use DSv4 chat template #1182

Merged

dsv4-fp4-b300-sglang: revert to #1143 low-latency-only baseline #1184

Merged

claude Bot mentioned this pull request Apr 26, 2026

[co-authored with sglang community maintainers leads at radixark] [NVIDIA][SGLang][redo PR] B300 DeepSeek v4 FP4 SGLang: recipe-per-CONC split + DP-attn SWA tweak #1185

Merged

3 tasks

cquil11 mentioned this pull request Apr 26, 2026

Add B300 config: dsv4-fp4-sglang-mtp #1151

Closed

3 tasks

Conversation

hnyls2002 commented Apr 26, 2026

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

Qiaolin-Yu commented Apr 26, 2026

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

Qiaolin-Yu commented Apr 26, 2026

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

Qiaolin-Yu commented Apr 26, 2026

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

Oseltamivir left a comment

Choose a reason for hiding this comment

Uh oh!

yhyang201 commented Apr 26, 2026

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

yhyang201 commented Apr 26, 2026

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

yhyang201 commented Apr 26, 2026

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

yhyang201 commented Apr 26, 2026

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants