Skip to content

dsv4-fp4-b300-sglang-mtp: pass --dsv4 to use DSv4 chat template#1182

Merged
Oseltamivir merged 4 commits intomainfrom
chore/sgl-dsv4-b300-mtp-2
Apr 27, 2026
Merged

dsv4-fp4-b300-sglang-mtp: pass --dsv4 to use DSv4 chat template#1182
Oseltamivir merged 4 commits intomainfrom
chore/sgl-dsv4-b300-mtp-2

Conversation

@cquil11
Copy link
Copy Markdown
Collaborator

@cquil11 cquil11 commented Apr 26, 2026

Summary

  • Follow-up to sglang dsv4 MTP #1166 (sglang dsv4 MTP). Adds --dsv4 to the run_benchmark_serving call in benchmarks/single_node/dsv4_fp4_b300_sglang_mtp.sh, which routes prompts through encoding_dsv4.py (added in [DSv4] add jinja chat template support #1153) and emits the <bos><User>...<Assistant><think> framing DeepSeek-V4-Pro expects.
  • sglang dsv4 MTP #1166 had to strip --use-chat-template because the DSv4-Pro tokenizer ships without a jinja chat_template--dsv4 is exactly the escape hatch for that, since it implies --use-chat-template but uses the self-contained DSv4 encoder instead of the tokenizer's missing template.
  • Restores AGENTS.md compliance: MTP scripts must benchmark against chat-formatted inputs because EAGLE acceptance rate silently regresses on raw random tokens.
  • Adds matching perf-changelog.yaml entry to retrigger the dsv4-fp4-b300-sglang-mtp sweep.

Test plan

  • CI sweep for dsv4-fp4-b300-sglang-mtp runs on this PR (perf-changelog entry triggers it)
  • Compare EAGLE acceptance rate / output throughput vs. the no-chat-template numbers from sglang dsv4 MTP #1166

🤖 Generated with Claude Code

Routes benchmark prompts through encoding_dsv4.py (added in PR #1153)
so DeepSeek-V4-Pro receives the <bos><User>...<Assistant><think>
framing it was trained against. PR #1166 had to drop --use-chat-template
because the DSv4-Pro tokenizer has no jinja chat_template, which is
exactly what --dsv4 sidesteps.

Restores AGENTS.md compliance for MTP scripts (EAGLE acceptance rate
silently regresses against raw random tokens).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cquil11 cquil11 requested a review from a team April 26, 2026 18:33
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional findings (outside current diff — PR may have been updated during review):

  • 🟡 perf-changelog.yaml:1912 — The new perf-changelog.yaml entry at line 1912 has pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX — a literal XXX placeholder that was never substituted with this PR's number (1182). Every other entry in the file uses a real PR number, and AGENTS.md documents that format. Trivial fix: replace XXX with 1182.

    Extended reasoning...

    What's wrong

    The newly-added perf-changelog entry has an unresolved placeholder in its pr-link field:

    - config-keys:
        - dsv4-fp4-b300-sglang-mtp
      description:
        - "Pass --dsv4 (PR #1153) to run_benchmark_serving ..."
        ...
      pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX

    The literal string XXX is the template marker shown in AGENTS.md (lines 266 and 289) for new entries — it should have been replaced with the actual PR number (1182) before opening this PR.

    How it manifests

    Any consumer of perf-changelog.yaml that turns pr-link into a clickable hyperlink (changelog renderers, the perf-tracking dashboard, anything that surfaces the changelog as docs) will produce a broken link pointing at /pull/XXX, which is not a valid PR. The audit trail tying this changelog row back to its source PR is lost.

    Why existing checks don't catch it

    I confirmed via the verifier observations that utils/matrix_logic/validation.py only requires pr_link to be a non-empty string — it doesn't validate that the URL points to a real PR or even that the trailing path component is numeric. CI sweep triggering is keyed on config-keys, not pr-link, so the sweep for dsv4-fp4-b300-sglang-mtp will still run; the broken link is purely cosmetic / documentation.

    Step-by-step proof

    1. Open perf-changelog.yaml and jump to the diff hunk at line 1912.
    2. The added entry's last field is literally pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX.
    3. Compare against the other entries added in adjacent diffs — every one ends in a numeric PR id (e.g. /pull/1166, /pull/1174, /pull/1178, /pull/1155). A grep for pull/XXX across the file matches only this new line; every other entry (~150+) uses a real number.
    4. The PR metadata for this change shows it is PR #1182, so the correct value is /pull/1182.
    5. AGENTS.md (lines ~266, ~289) shows pull/XXX as the placeholder template authors are expected to substitute when adding a new entry. That substitution was missed here.

    Impact

    • Functional: none — the sweep still triggers because triggering keys off config-keys.
    • Documentation/audit: a broken link in the rendered changelog and a lost back-reference from the entry to PR #1182.
    • Convention: violates the established format used by every other entry in the file.

    Fix

    Replace pull/XXX with pull/1182 on the new line. One-character (well, three-character) edit.

This was referenced Apr 26, 2026
@Oseltamivir Oseltamivir merged commit a53ae04 into main Apr 27, 2026
11 of 19 checks passed
@Oseltamivir Oseltamivir deleted the chore/sgl-dsv4-b300-mtp-2 branch April 27, 2026 01:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

2 participants