dsv4-fp4-b300-sglang-mtp: pass --dsv4 to use DSv4 chat template#1182
dsv4-fp4-b300-sglang-mtp: pass --dsv4 to use DSv4 chat template#1182Oseltamivir merged 4 commits intomainfrom
Conversation
Routes benchmark prompts through encoding_dsv4.py (added in PR #1153) so DeepSeek-V4-Pro receives the <bos><User>...<Assistant><think> framing it was trained against. PR #1166 had to drop --use-chat-template because the DSv4-Pro tokenizer has no jinja chat_template, which is exactly what --dsv4 sidesteps. Restores AGENTS.md compliance for MTP scripts (EAGLE acceptance rate silently regresses against raw random tokens). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Additional findings (outside current diff — PR may have been updated during review):
-
🟡
perf-changelog.yaml:1912— The new perf-changelog.yaml entry at line 1912 haspr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX— a literalXXXplaceholder that was never substituted with this PR's number (1182). Every other entry in the file uses a real PR number, and AGENTS.md documents that format. Trivial fix: replaceXXXwith1182.Extended reasoning...
What's wrong
The newly-added perf-changelog entry has an unresolved placeholder in its
pr-linkfield:- config-keys: - dsv4-fp4-b300-sglang-mtp description: - "Pass --dsv4 (PR #1153) to run_benchmark_serving ..." ... pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX
The literal string
XXXis the template marker shown in AGENTS.md (lines 266 and 289) for new entries — it should have been replaced with the actual PR number (1182) before opening this PR.How it manifests
Any consumer of perf-changelog.yaml that turns
pr-linkinto a clickable hyperlink (changelog renderers, the perf-tracking dashboard, anything that surfaces the changelog as docs) will produce a broken link pointing at/pull/XXX, which is not a valid PR. The audit trail tying this changelog row back to its source PR is lost.Why existing checks don't catch it
I confirmed via the verifier observations that
utils/matrix_logic/validation.pyonly requirespr_linkto be a non-empty string — it doesn't validate that the URL points to a real PR or even that the trailing path component is numeric. CI sweep triggering is keyed onconfig-keys, notpr-link, so the sweep fordsv4-fp4-b300-sglang-mtpwill still run; the broken link is purely cosmetic / documentation.Step-by-step proof
- Open
perf-changelog.yamland jump to the diff hunk at line 1912. - The added entry's last field is literally
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXX. - Compare against the other entries added in adjacent diffs — every one ends in a numeric PR id (e.g.
/pull/1166,/pull/1174,/pull/1178,/pull/1155). A grep forpull/XXXacross the file matches only this new line; every other entry (~150+) uses a real number. - The PR metadata for this change shows it is PR #1182, so the correct value is
/pull/1182. - AGENTS.md (lines ~266, ~289) shows
pull/XXXas the placeholder template authors are expected to substitute when adding a new entry. That substitution was missed here.
Impact
- Functional: none — the sweep still triggers because triggering keys off
config-keys. - Documentation/audit: a broken link in the rendered changelog and a lost back-reference from the entry to PR #1182.
- Convention: violates the established format used by every other entry in the file.
Fix
Replace
pull/XXXwithpull/1182on the new line. One-character (well, three-character) edit. - Open
Summary
--dsv4to therun_benchmark_servingcall inbenchmarks/single_node/dsv4_fp4_b300_sglang_mtp.sh, which routes prompts throughencoding_dsv4.py(added in [DSv4] add jinja chat template support #1153) and emits the<bos><User>...<Assistant><think>framing DeepSeek-V4-Pro expects.--use-chat-templatebecause the DSv4-Pro tokenizer ships without a jinjachat_template—--dsv4is exactly the escape hatch for that, since it implies--use-chat-templatebut uses the self-contained DSv4 encoder instead of the tokenizer's missing template.AGENTS.mdcompliance: MTP scripts must benchmark against chat-formatted inputs because EAGLE acceptance rate silently regresses on raw random tokens.perf-changelog.yamlentry to retrigger thedsv4-fp4-b300-sglang-mtpsweep.Test plan
dsv4-fp4-b300-sglang-mtpruns on this PR (perf-changelog entry triggers it)🤖 Generated with Claude Code