dsv4-fp4-b300-sglang: revert to #1143 low-latency-only baseline#1184
dsv4-fp4-b300-sglang: revert to #1143 low-latency-only baseline#1184
Conversation
Reverts the matrix expansion (#1132), script edits (#1158, #1173, #1174), and changelog retriggers (#1178) on top of the original #1143 entry. Restores the script and config block to their #1143 state and clears all prior dsv4-fp4-b300-sglang changelog entries to start fresh. The dsv4-fp4-b300-sglang-mtp config (#1166) is untouched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Additional findings (outside current diff — PR may have been updated during review):
-
🔴
perf-changelog.yaml:1861-1869— The new changelog entry appended at the bottom ofperf-changelog.yaml(lines 1861-1869) is missing the requiredpr-linkfield.ChangelogEntryinutils/matrix_logic/validation.py:338-345declarespr_link: str = Field(alias="pr-link")with no default andextra="forbid", soprocess_changelog.pywill raiseValidationErrorwhen loading this entry and block the run-sweep workflow. Add apr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1184line to fix.Extended reasoning...
What the bug is
The new entry appended at the end of
perf-changelog.yaml(the dsv4-fp4-b300-sglang revert entry, lines 1861-1869) lacks apr-linkfield. Every one of the 209 prior entries in the file has one — and the just-deleted entries this PR replaces all had one — so the omission is clearly an oversight.Why it breaks
utils/matrix_logic/validation.py:338-345defines:class ChangelogEntry(BaseModel): model_config = ConfigDict(extra="forbid", populate_by_name=True) config_keys: list[str] = Field(alias="config-keys") description: list[str] pr_link: str = Field(alias="pr-link") # required, no default evals_only: bool | None = Field(default=None, alias="evals-only")
pr_linkis declared without a default, making it mandatory.utils/process_changelog.py:144callsChangelogEntry.model_validate(entry_data)for every entry parsed from the YAML diff additions, so when this entry is processed Pydantic will raise:pydantic_core._pydantic_core.ValidationError: 1 validation error for ChangelogEntry pr-link Field required [type=missing, ...]Why nothing else catches this
The author wrote a clean YAML block (
config-keys,description) and theextra="forbid"config means there is no silent acceptance — the missing field is simply unfilled. There is no upstream default in the class, no fallback inprocess_changelog.py, and no other layer that would supply a placeholder..github/workflows/run-sweep.ymltriggersprocess_changelog.pyon PRs/pushes touchingperf-changelog.yaml, so the CI job is what will surface the error.Step-by-step proof
- CI runs the changelog processor on this PR (touches
perf-changelog.yaml). process_changelog.pyparses the diff additions and yields one new entry — the dict at the bottom of the file.- That dict has keys
config-keysanddescriptiononly (count:grep -c "^- config-keys:" perf-changelog.yaml→ 210;grep -c "^ pr-link:" perf-changelog.yaml→ 209; exactly one entry is short). ChangelogEntry.model_validate(entry_data)is called at validation.py invocation site (utils/process_changelog.py:144).- Pydantic checks required fields, sees
pr-linkis absent, raisesValidationError. - The workflow fails; the changelog never gets processed.
Impact
This will block the
run-sweepGitHub Actions workflow on this PR. Until the line is added, the changelog cannot be processed and the sweep run cannot proceed.How to fix
Append a
pr-linkline to the new entry:- config-keys: - dsv4-fp4-b300-sglang description: - "Revert dsv4-fp4-b300-sglang to the #1143 low-latency-only baseline" - "Image: lmsysorg/sglang:deepseek-v4-b300" - "Recipe: TP=8, EP=1, no DP-attn, no DeepEP — full concurrency sweep retained" - "Configs: 1k1k conc 4-1024, 8k1k conc 4-512" - "Reverts the matrix expansion / script changes from #1132, #1158, #1173, #1174, #1178; clears prior changelog entries to start fresh" pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1184
- CI runs the changelog processor on this PR (touches
| - "Prefix caching and speculative decoding disabled for baseline numbers" | ||
| pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1131 | ||
|
|
||
| - config-keys: | ||
| - dsv4-fp4-b300-sglang | ||
| description: | ||
| - "Restore the recipe-per-CONC split (low-latency / balanced / max-throughput) on top of the low-latency-only fallback from #1143; the DeepEP FP8 weight-postprocess path is fixed, so the high-throughput scenario runs again" | ||
| - "Recipes from https://docs.sglang.io/cookbook/autoregressive/DeepSeek/DeepSeek-V4" | ||
| pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1158 | ||
|
|
||
| - config-keys: | ||
| - dsv4-fp8-mi355x-sglang | ||
| description: |
There was a problem hiding this comment.
🔴 This PR deletes six prior dsv4-fp4-b300-sglang changelog entries from perf-changelog.yaml, but utils/process_changelog.py:get_added_lines explicitly raises ValueError on any non-whitespace deletion in this file. The run-sweep workflow invokes that script on every PR with a sweep label and on every push to main, so the very first deletion (- config-keys:) will abort the workflow before any benchmark runs — defeating the PR's stated test plan. Fix by dropping the wipe and only appending the new entry.
Extended reasoning...
What the bug is
utils/process_changelog.py enforces an append-only invariant on perf-changelog.yaml. The relevant code at utils/process_changelog.py:24-37 (get_added_lines) iterates over the unified diff for the changelog file and explicitly rejects any non-whitespace deletion:
if line.startswith("-") and not line.startswith("---"):
deleted_content = line[1:]
if deleted_content.strip():
raise ValueError(
f"Deletions are not allowed in {filepath}. "
f"Only additions to the changelog are permitted. "
f"Found deleted line: {deleted_content}"
)This PR's diff does the opposite: it removes six prior dsv4-fp4-b300-sglang entries (PRs #1143, #1132, #1158, #1173, #1174, #1178) spanning lines 1779-1793 and 1838-1855 in the diff. There are roughly 30 non-whitespace deleted lines (- config-keys:, - dsv4-fp4-b300-sglang, the description bullets, and pr-link: ...), all of which strip to non-empty content.
Why this is a hard CI blocker
The validation is wired into .github/workflows/run-sweep.yml:91-101 (the setup job), which calls process_changelog.py whenever the workflow runs. The workflow runs in two contexts that matter here:
- On PR with the
sweep-enabled/full-sweep-enabledlabel — required to actually exercise the sweep this PR is asking for. - On push to main after merge — runs unconditionally unless the commit message contains
[skip-sweep].
In both contexts, the deletion check fires before any pydantic ChangelogEntry validation. There is no override flag.
Proof: walking through the diff
Take the first non-whitespace deletion in the diff, around line 1782:
-- config-keys:
- - dsv4-fp4-b300-sglang
- description:
- - "Restore the recipe-per-CONC split ... from #1143; ..."
When get_added_lines processes this:
- Line
-- config-keys:→ starts with-, doesn't start with---;deleted_content = "- config-keys:",.strip()is non-empty →ValueErrorraised on the very first iteration.
The setup job exits non-zero, no benchmark or sweep job runs, and the PR's test plan ("Sweep run on B300 completes end-to-end") cannot be satisfied. Post-merge, the same failure occurs on the push-to-main run, so the new entry never triggers a sweep — directly defeating the PR's purpose.
What contradicts existing safeguards
The PR description literally says "Wipes all six prior dsv4-fp4-b300-sglang changelog entries (from #1143, #1132, #1158, #1173, #1174, #1178) and adds a single fresh entry to start clean." This is in direct conflict with the append-only invariant the code enforces. This bug is independent of any pr-link issue — even if the new entry were perfect, the deletions would still trip get_added_lines first.
How to fix
Two options:
- Drop the wipe — leave the prior six entries in place and only append the new "revert to [NVIDIA] chore: B300 single node DeepSeek v4 SGLang LOW LATENCY ONLY #1143 baseline" entry. The reader can already infer the chronological revert from the PR-link history.
- Relax the validator — if intentional rewrites are desired, add an opt-in escape (e.g., a
[changelog-rewrite]commit-message marker, or asquash:block in the YAML), but this is a larger policy change and should not be smuggled in here.
Option (1) is the obviously safe path.
Mirrors the b300 revert in #1184. Restores benchmarks/single_node/ dsv4_fp4_b200.sh and the dsv4-fp4-b200-sglang block in nvidia-master.yaml to their pre-#1158 state (= post-#1131 baseline) — un-pins the image digest and restores conc-start=4 in the low-latency rows. No perf-changelog edit needed; #1158 did not add a b200 changelog entry. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…C split + DP-attn SWA tweak (#1185) * dsv4-fp4-b300-sglang: recipe-per-CONC split + DP-attn SWA tweak Squashes the cumulative changes from #1158 and #1174 into a single commit on top of the #1184 baseline. Excludes the iterative --max-running-requests floor from #1173. - Image pinned to lmsysorg/sglang:deepseek-v4-b300@sha256:26e116bd... - Search space: TP8/EP1 conc=1, TP4/EP1 conc=32, TP4/EP4 dp-attn conc=512 for both 1k1k and 8k1k - Script dispatches on DP_ATTENTION knob: TP-only (flashinfer_mxfp4) vs DP-attn (deepep + prefill-delayer + mega_moe env vars) - DP-attn path enables SGLANG_OPT_SWA_EVICT_DROP_PAGE_MARGIN=1 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * perf-changelog: add pr-link for #1185 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Apply suggestion from @Qiaolin-Yu --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Qiaolin Yu <liin1211@outlook.com>
Summary
dsv4-fp4-b300-sglangto the state it was in immediately after [NVIDIA] chore: B300 single node DeepSeek v4 SGLang LOW LATENCY ONLY #1143.benchmarks/single_node/dsv4_fp4_b300_sglang.sh(script body) and thedsv4-fp4-b300-sglangblock in.github/configs/nvidia-master.yaml(low-latency only matrix:tp=8 ep=1 dpa=false, 1k1k conc 4–1024, 8k1k conc 4–512).dsv4-fp4-b300-sglangchangelog entries (from [NVIDIA] chore: B300 single node DeepSeek v4 SGLang LOW LATENCY ONLY #1143, [NVIDIA] chore: B300 single node DeepSeek v4 SGLang #1132, fix sgl b200/b300 dpsk-v4 script #1158, dsv4-fp4 sglang b300: floor --max-running-requests at 8 #1173, dsv4-b300-sglang: retry #1174, dsv4-b300-sglang: retry #1178) and adds a single fresh entry to start clean.What's reverted
--max-running-requestsat 8SGLANG_OPT_SWA_EVICT_DROP_PAGE_MARGIN=1for DP-attnNot touched
dsv4-fp4-b300-sglang-mtpconfig and its sglang dsv4 MTP #1166 changelog entry.dsv4_fp4_b300_sglang.sh).nvidia-master.yaml/perf-changelog.yaml.Test plan
tp=8 ep=1 dpa=falserows only.🤖 Generated with Claude Code