dsv4-fp4-b300-sglang-mtp: pass --dsv4 to use DSv4 chat template by cquil11 · Pull Request #1182 · SemiAnalysisAI/InferenceX

cquil11 · 2026-04-26T18:33:45Z

Summary

Follow-up to sglang dsv4 MTP #1166 (sglang dsv4 MTP). Adds --dsv4 to the run_benchmark_serving call in benchmarks/single_node/dsv4_fp4_b300_sglang_mtp.sh, which routes prompts through encoding_dsv4.py (added in [DSv4] add jinja chat template support #1153) and emits the <bos><User>...<Assistant><think> framing DeepSeek-V4-Pro expects.
sglang dsv4 MTP #1166 had to strip --use-chat-template because the DSv4-Pro tokenizer ships without a jinja chat_template — --dsv4 is exactly the escape hatch for that, since it implies --use-chat-template but uses the self-contained DSv4 encoder instead of the tokenizer's missing template.
Restores AGENTS.md compliance: MTP scripts must benchmark against chat-formatted inputs because EAGLE acceptance rate silently regresses on raw random tokens.
Adds matching perf-changelog.yaml entry to retrigger the dsv4-fp4-b300-sglang-mtp sweep.

Test plan

CI sweep for dsv4-fp4-b300-sglang-mtp runs on this PR (perf-changelog entry triggers it)
Compare EAGLE acceptance rate / output throughput vs. the no-chat-template numbers from sglang dsv4 MTP #1166

🤖 Generated with Claude Code

Routes benchmark prompts through encoding_dsv4.py (added in PR #1153) so DeepSeek-V4-Pro receives the <bos><User>...<Assistant><think> framing it was trained against. PR #1166 had to drop --use-chat-template because the DSv4-Pro tokenizer has no jinja chat_template, which is exactly what --dsv4 sidesteps. Restores AGENTS.md compliance for MTP scripts (EAGLE acceptance rate silently regresses against raw random tokens). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-26T18:33:54Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

claude

Additional findings (outside current diff — PR may have been updated during review):

🟡 perf-changelog.yaml:1912 — The new perf-changelog.yaml entry at line 1912 has pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX — a literal XXX placeholder that was never substituted with this PR's number (1182). Every other entry in the file uses a real PR number, and AGENTS.md documents that format. Trivial fix: replace XXX with 1182.
Extended reasoning...

What's wrong

The newly-added perf-changelog entry has an unresolved placeholder in its pr-link field:
```
- config-keys:
    - dsv4-fp4-b300-sglang-mtp
  description:
    - "Pass --dsv4 (PR #1153) to run_benchmark_serving ..."
    ...
  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX
```
The literal string XXX is the template marker shown in AGENTS.md (lines 266 and 289) for new entries — it should have been replaced with the actual PR number (1182) before opening this PR.

How it manifests

Any consumer of perf-changelog.yaml that turns pr-link into a clickable hyperlink (changelog renderers, the perf-tracking dashboard, anything that surfaces the changelog as docs) will produce a broken link pointing at /pull/XXX, which is not a valid PR. The audit trail tying this changelog row back to its source PR is lost.

Why existing checks don't catch it

I confirmed via the verifier observations that utils/matrix_logic/validation.py only requires pr_link to be a non-empty string — it doesn't validate that the URL points to a real PR or even that the trailing path component is numeric. CI sweep triggering is keyed on config-keys, not pr-link, so the sweep for dsv4-fp4-b300-sglang-mtp will still run; the broken link is purely cosmetic / documentation.

Step-by-step proof
1. Open perf-changelog.yaml and jump to the diff hunk at line 1912.
2. The added entry's last field is literally pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX.
3. Compare against the other entries added in adjacent diffs — every one ends in a numeric PR id (e.g. /pull/1166, /pull/1174, /pull/1178, /pull/1155). A grep for pull/XXX across the file matches only this new line; every other entry (~150+) uses a real number.
4. The PR metadata for this change shows it is PR #1182, so the correct value is /pull/1182.
5. AGENTS.md (lines ~266, ~289) shows pull/XXX as the placeholder template authors are expected to substitute when adding a new entry. That substitution was missed here.
Impact
- Functional: none — the sweep still triggers because triggering keys off config-keys.
- Documentation/audit: a broken link in the rendered changelog and a lost back-reference from the entry to PR #1182.
- Convention: violates the established format used by every other entry in the file.
Fix

Replace pull/XXX with pull/1182 on the new line. One-character (well, three-character) edit.

cquil11 requested a review from a team April 26, 2026 18:33

github-project-automation Bot added this to InferenceMAX Board Apr 26, 2026

perf-changelog: fill in PR #1182 link

a351d19

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cquil11 added the full-sweep-enabled label Apr 26, 2026

perf-changelog: trim dsv4-mtp entry

5cb6b21

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

claude Bot reviewed Apr 26, 2026

View reviewed changes

This was referenced Apr 26, 2026

Add chat template in sgl b300 mtp #1183

Closed

Nothing to see here #1189

Closed

Merge branch 'main' into chore/sgl-dsv4-b300-mtp-2

74ee570

Oseltamivir merged commit a53ae04 into main Apr 27, 2026
11 of 19 checks passed

Oseltamivir deleted the chore/sgl-dsv4-b300-mtp-2 branch April 27, 2026 01:35

github-project-automation Bot moved this to Done in InferenceMAX Board Apr 27, 2026

This was referenced Apr 27, 2026

[AMD/ROCM] Update gptoss-fp4-mi355x-atom config #1195

Open

[AMD/Hyperloom] Tune dsr1-fp8-mi355x-sglang: --num-continuous-decode-steps 4 → 8 #1243

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dsv4-fp4-b300-sglang-mtp: pass --dsv4 to use DSv4 chat template#1182

dsv4-fp4-b300-sglang-mtp: pass --dsv4 to use DSv4 chat template#1182
Oseltamivir merged 4 commits intomainfrom
chore/sgl-dsv4-b300-mtp-2

cquil11 commented Apr 26, 2026

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

claude Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cquil11 commented Apr 26, 2026

Summary

Test plan

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

What's wrong

How it manifests

Why existing checks don't catch it

Step-by-step proof

Impact

Fix

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants