Skip to content

sglang dsv4 MTP#1166

Merged
Qiaolin-Yu merged 16 commits intoSemiAnalysisAI:mainfrom
Qiaolin-Yu:sglang-dsv4-MTP
Apr 26, 2026
Merged

sglang dsv4 MTP#1166
Qiaolin-Yu merged 16 commits intoSemiAnalysisAI:mainfrom
Qiaolin-Yu:sglang-dsv4-MTP

Conversation

@hnyls2002
Copy link
Copy Markdown
Collaborator

Add MTP recipe for DSv4 FP4 on B300 with SGLang.

Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@Qiaolin-Yu
Copy link
Copy Markdown
Collaborator

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang-mtp

@github-actions
Copy link
Copy Markdown
Contributor

@Qiaolin-Yu Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24948369055
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang-mtp
Pinned ref: 1b34a8d
Approval: not required (trusted collaborator).

@Qiaolin-Yu
Copy link
Copy Markdown
Collaborator

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang-mtp

@github-actions
Copy link
Copy Markdown
Contributor

@Qiaolin-Yu Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24948810957
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang-mtp
Pinned ref: 481482a
Approval: not required (trusted collaborator).

@Qiaolin-Yu
Copy link
Copy Markdown
Collaborator

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang-mtp

@github-actions
Copy link
Copy Markdown
Contributor

@Qiaolin-Yu Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24949162299
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang-mtp
Pinned ref: 47fefec
Approval: not required (trusted collaborator).

Copy link
Copy Markdown
Collaborator

@Oseltamivir Oseltamivir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@Oseltamivir Oseltamivir requested a review from Qiaolin-Yu as a code owner April 26, 2026 05:41
@yhyang201
Copy link
Copy Markdown
Collaborator

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang-mtp

@github-actions
Copy link
Copy Markdown
Contributor

@yhyang201 Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24953792156
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang-mtp
Pinned ref: f64505b
Approval: not required (trusted collaborator).

The tokenizer for DSv4-Pro has no chat_template set, so
--use-chat-template causes benchmark_serving.py to crash with
ValueError. Remove it to align with dsv4_fp4_b300_sglang.sh.

Also add a floor of 8 to --max-running-requests to match the
base script and avoid too-low values at low concurrency.
Rebase perf-changelog.yaml on latest main (preserving SemiAnalysisAI#1173 and SemiAnalysisAI#1174
entries) and append the MTP config entry for PR SemiAnalysisAI#1166.
@yhyang201
Copy link
Copy Markdown
Collaborator

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang-mtp

@github-actions
Copy link
Copy Markdown
Contributor

@yhyang201 Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24956420257
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang-mtp
Pinned ref: 4155a49
Approval: not required (trusted collaborator).

@yhyang201
Copy link
Copy Markdown
Collaborator

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang-mtp

@github-actions
Copy link
Copy Markdown
Contributor

@yhyang201 Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24957535378
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang-mtp
Pinned ref: 4155a49
Approval: not required (trusted collaborator).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@yhyang201
Copy link
Copy Markdown
Collaborator

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang-mtp

@github-actions
Copy link
Copy Markdown
Contributor

@yhyang201 Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24957880039
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang-mtp
Pinned ref: c1d65ee
Approval: not required (trusted collaborator).

…(4,1,5)"

This reverts the EAGLE spec params back to (3, 1, 4):
  --speculative-num-steps 3
  --speculative-eagle-topk 1
  --speculative-num-draft-tokens 4

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Qiaolin-Yu Qiaolin-Yu merged commit 009f11d into SemiAnalysisAI:main Apr 26, 2026
cquil11 added a commit that referenced this pull request Apr 26, 2026
* dsv4-fp4-b300-sglang: revert to #1143 low-latency-only baseline

Reverts the matrix expansion (#1132), script edits (#1158, #1173, #1174),
and changelog retriggers (#1178) on top of the original #1143 entry.
Restores the script and config block to their #1143 state and clears
all prior dsv4-fp4-b300-sglang changelog entries to start fresh.

The dsv4-fp4-b300-sglang-mtp config (#1166) is untouched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* perf-changelog: add pr-link for #1184

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* perf-changelog: keep only the original #1143 entry, drop new entry

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

5 participants