Skip to content

dsv4-fp4-b300-sglang: revert to #1143 low-latency-only baseline#1184

Merged
cquil11 merged 3 commits intomainfrom
chore/revert-dsv4-fp4-b300-sglang-to-1143-baseline
Apr 26, 2026
Merged

dsv4-fp4-b300-sglang: revert to #1143 low-latency-only baseline#1184
cquil11 merged 3 commits intomainfrom
chore/revert-dsv4-fp4-b300-sglang-to-1143-baseline

Conversation

@cquil11
Copy link
Copy Markdown
Collaborator

@cquil11 cquil11 commented Apr 26, 2026

Summary

What's reverted

PR Change Reverted?
#1143 initial low-latency-only entry matched (this PR re-creates the same script/yaml state)
#1132 restore balanced/max-throughput rows yes
#1158 fix sgl b200/b300 dpsk-v4 script (b300 portion) yes
#1173 floor --max-running-requests at 8 yes
#1174 SGLANG_OPT_SWA_EVICT_DROP_PAGE_MARGIN=1 for DP-attn yes
#1178 changelog-only retrigger yes

Not touched

Test plan

  • Sweep run on B300 completes end-to-end against the low-latency-only matrix.
  • Result filenames reflect tp=8 ep=1 dpa=false rows only.

🤖 Generated with Claude Code

Reverts the matrix expansion (#1132), script edits (#1158, #1173, #1174),
and changelog retriggers (#1178) on top of the original #1143 entry.
Restores the script and config block to their #1143 state and clears
all prior dsv4-fp4-b300-sglang changelog entries to start fresh.

The dsv4-fp4-b300-sglang-mtp config (#1166) is untouched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

cquil11 and others added 2 commits April 26, 2026 14:48
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cquil11 cquil11 merged commit 0f4e42f into main Apr 26, 2026
17 checks passed
@cquil11 cquil11 deleted the chore/revert-dsv4-fp4-b300-sglang-to-1143-baseline branch April 26, 2026 19:52
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional findings (outside current diff — PR may have been updated during review):

  • 🔴 perf-changelog.yaml:1861-1869 — The new changelog entry appended at the bottom of perf-changelog.yaml (lines 1861-1869) is missing the required pr-link field. ChangelogEntry in utils/matrix_logic/validation.py:338-345 declares pr_link: str = Field(alias="pr-link") with no default and extra="forbid", so process_changelog.py will raise ValidationError when loading this entry and block the run-sweep workflow. Add a pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1184 line to fix.

    Extended reasoning...

    What the bug is

    The new entry appended at the end of perf-changelog.yaml (the dsv4-fp4-b300-sglang revert entry, lines 1861-1869) lacks a pr-link field. Every one of the 209 prior entries in the file has one — and the just-deleted entries this PR replaces all had one — so the omission is clearly an oversight.

    Why it breaks

    utils/matrix_logic/validation.py:338-345 defines:

    class ChangelogEntry(BaseModel):
        model_config = ConfigDict(extra="forbid", populate_by_name=True)
        config_keys: list[str] = Field(alias="config-keys")
        description: list[str]
        pr_link: str = Field(alias="pr-link")  # required, no default
        evals_only: bool | None = Field(default=None, alias="evals-only")

    pr_link is declared without a default, making it mandatory. utils/process_changelog.py:144 calls ChangelogEntry.model_validate(entry_data) for every entry parsed from the YAML diff additions, so when this entry is processed Pydantic will raise:

    pydantic_core._pydantic_core.ValidationError: 1 validation error for ChangelogEntry
    pr-link
      Field required [type=missing, ...]
    

    Why nothing else catches this

    The author wrote a clean YAML block (config-keys, description) and the extra="forbid" config means there is no silent acceptance — the missing field is simply unfilled. There is no upstream default in the class, no fallback in process_changelog.py, and no other layer that would supply a placeholder. .github/workflows/run-sweep.yml triggers process_changelog.py on PRs/pushes touching perf-changelog.yaml, so the CI job is what will surface the error.

    Step-by-step proof

    1. CI runs the changelog processor on this PR (touches perf-changelog.yaml).
    2. process_changelog.py parses the diff additions and yields one new entry — the dict at the bottom of the file.
    3. That dict has keys config-keys and description only (count: grep -c "^- config-keys:" perf-changelog.yaml → 210; grep -c "^ pr-link:" perf-changelog.yaml → 209; exactly one entry is short).
    4. ChangelogEntry.model_validate(entry_data) is called at validation.py invocation site (utils/process_changelog.py:144).
    5. Pydantic checks required fields, sees pr-link is absent, raises ValidationError.
    6. The workflow fails; the changelog never gets processed.

    Impact

    This will block the run-sweep GitHub Actions workflow on this PR. Until the line is added, the changelog cannot be processed and the sweep run cannot proceed.

    How to fix

    Append a pr-link line to the new entry:

    - config-keys:
        - dsv4-fp4-b300-sglang
      description:
        - "Revert dsv4-fp4-b300-sglang to the #1143 low-latency-only baseline"
        - "Image: lmsysorg/sglang:deepseek-v4-b300"
        - "Recipe: TP=8, EP=1, no DP-attn, no DeepEP — full concurrency sweep retained"
        - "Configs: 1k1k conc 4-1024, 8k1k conc 4-512"
        - "Reverts the matrix expansion / script changes from #1132, #1158, #1173, #1174, #1178; clears prior changelog entries to start fresh"
      pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1184

Comment thread perf-changelog.yaml
Comment on lines 1779 to 1784
- "Prefix caching and speculative decoding disabled for baseline numbers"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1131

- config-keys:
- dsv4-fp4-b300-sglang
description:
- "Restore the recipe-per-CONC split (low-latency / balanced / max-throughput) on top of the low-latency-only fallback from #1143; the DeepEP FP8 weight-postprocess path is fixed, so the high-throughput scenario runs again"
- "Recipes from https://docs.sglang.io/cookbook/autoregressive/DeepSeek/DeepSeek-V4"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1158

- config-keys:
- dsv4-fp8-mi355x-sglang
description:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 This PR deletes six prior dsv4-fp4-b300-sglang changelog entries from perf-changelog.yaml, but utils/process_changelog.py:get_added_lines explicitly raises ValueError on any non-whitespace deletion in this file. The run-sweep workflow invokes that script on every PR with a sweep label and on every push to main, so the very first deletion (- config-keys:) will abort the workflow before any benchmark runs — defeating the PR's stated test plan. Fix by dropping the wipe and only appending the new entry.

Extended reasoning...

What the bug is

utils/process_changelog.py enforces an append-only invariant on perf-changelog.yaml. The relevant code at utils/process_changelog.py:24-37 (get_added_lines) iterates over the unified diff for the changelog file and explicitly rejects any non-whitespace deletion:

if line.startswith("-") and not line.startswith("---"):
    deleted_content = line[1:]
    if deleted_content.strip():
        raise ValueError(
            f"Deletions are not allowed in {filepath}. "
            f"Only additions to the changelog are permitted. "
            f"Found deleted line: {deleted_content}"
        )

This PR's diff does the opposite: it removes six prior dsv4-fp4-b300-sglang entries (PRs #1143, #1132, #1158, #1173, #1174, #1178) spanning lines 1779-1793 and 1838-1855 in the diff. There are roughly 30 non-whitespace deleted lines (- config-keys:, - dsv4-fp4-b300-sglang, the description bullets, and pr-link: ...), all of which strip to non-empty content.

Why this is a hard CI blocker

The validation is wired into .github/workflows/run-sweep.yml:91-101 (the setup job), which calls process_changelog.py whenever the workflow runs. The workflow runs in two contexts that matter here:

  1. On PR with the sweep-enabled / full-sweep-enabled label — required to actually exercise the sweep this PR is asking for.
  2. On push to main after merge — runs unconditionally unless the commit message contains [skip-sweep].

In both contexts, the deletion check fires before any pydantic ChangelogEntry validation. There is no override flag.

Proof: walking through the diff

Take the first non-whitespace deletion in the diff, around line 1782:

-- config-keys:
-    - dsv4-fp4-b300-sglang
-  description:
-    - "Restore the recipe-per-CONC split ... from #1143; ..."

When get_added_lines processes this:

  • Line -- config-keys: → starts with -, doesn't start with ---; deleted_content = "- config-keys:", .strip() is non-empty → ValueError raised on the very first iteration.

The setup job exits non-zero, no benchmark or sweep job runs, and the PR's test plan ("Sweep run on B300 completes end-to-end") cannot be satisfied. Post-merge, the same failure occurs on the push-to-main run, so the new entry never triggers a sweep — directly defeating the PR's purpose.

What contradicts existing safeguards

The PR description literally says "Wipes all six prior dsv4-fp4-b300-sglang changelog entries (from #1143, #1132, #1158, #1173, #1174, #1178) and adds a single fresh entry to start clean." This is in direct conflict with the append-only invariant the code enforces. This bug is independent of any pr-link issue — even if the new entry were perfect, the deletions would still trip get_added_lines first.

How to fix

Two options:

  1. Drop the wipe — leave the prior six entries in place and only append the new "revert to [NVIDIA] chore: B300 single node DeepSeek v4 SGLang LOW LATENCY ONLY #1143 baseline" entry. The reader can already infer the chronological revert from the PR-link history.
  2. Relax the validator — if intentional rewrites are desired, add an opt-in escape (e.g., a [changelog-rewrite] commit-message marker, or a squash: block in the YAML), but this is a larger policy change and should not be smuggled in here.

Option (1) is the obviously safe path.

cquil11 added a commit that referenced this pull request Apr 26, 2026
Mirrors the b300 revert in #1184. Restores benchmarks/single_node/
dsv4_fp4_b200.sh and the dsv4-fp4-b200-sglang block in nvidia-master.yaml
to their pre-#1158 state (= post-#1131 baseline) — un-pins the image
digest and restores conc-start=4 in the low-latency rows.

No perf-changelog edit needed; #1158 did not add a b200 changelog entry.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cquil11 added a commit that referenced this pull request Apr 26, 2026
…C split + DP-attn SWA tweak (#1185)

* dsv4-fp4-b300-sglang: recipe-per-CONC split + DP-attn SWA tweak

Squashes the cumulative changes from #1158 and #1174 into a single
commit on top of the #1184 baseline. Excludes the iterative
--max-running-requests floor from #1173.

- Image pinned to lmsysorg/sglang:deepseek-v4-b300@sha256:26e116bd...
- Search space: TP8/EP1 conc=1, TP4/EP1 conc=32, TP4/EP4 dp-attn
  conc=512 for both 1k1k and 8k1k
- Script dispatches on DP_ATTENTION knob: TP-only (flashinfer_mxfp4)
  vs DP-attn (deepep + prefill-delayer + mega_moe env vars)
- DP-attn path enables SGLANG_OPT_SWA_EVICT_DROP_PAGE_MARGIN=1

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* perf-changelog: add pr-link for #1185

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Apply suggestion from @Qiaolin-Yu

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Qiaolin Yu <liin1211@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

1 participant