Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 13 additions & 12 deletions .github/configs/nvidia-master.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1879,18 +1879,19 @@ dsv4-fp4-b300-sglang:
- isl: 1024
osl: 1024
search-space:
- { tp: 8, ep: 1, conc-start: 1, conc-end: 1 }
- { tp: 4, ep: 1, conc-start: 32, conc-end: 32 }
- { tp: 4, ep: 4, dp-attn: true, conc-start: 512, conc-end: 512 }
- { tp: 8, ep: 8, dp-attn: true, conc-start: 8192, conc-end: 8192 }
- isl: 8192
osl: 1024
search-space:
- { tp: 8, ep: 1, conc-start: 1, conc-end: 1 }
- { tp: 4, ep: 1, conc-start: 32, conc-end: 32 }
- { tp: 4, ep: 4, dp-attn: true, conc-start: 512, conc-end: 512 }
- { tp: 8, ep: 8, dp-attn: true, conc-start: 2048, conc-end: 2048 }
- { tp: 8, ep: 8, dp-attn: true, conc-start: 4096, conc-end: 4096 }
# - { tp: 8, ep: 1, conc-start: 1, conc-end: 1 }
# - { tp: 4, ep: 1, conc-start: 32, conc-end: 32 }
# - { tp: 4, ep: 4, dp-attn: true, conc-start: 512, conc-end: 512 }
# - { tp: 8, ep: 8, dp-attn: true, conc-start: 8192, conc-end: 8192 }
- { tp: 8, ep: 8, dp-attn: true, conc-start: 12288, conc-end: 12288 }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 This PR adds a new conc=12288 recipe to dsv4-fp4-b300-sglang (new search-space row in nvidia-master.yaml plus matching CONC=12288 branch in dsv4_fp4_b300_sglang.sh) but does not add a corresponding entry to perf-changelog.yaml. Per AGENTS.md, perf-changelog.yaml is the documented trigger for benchmark runs on push-to-main, so without an entry the new recipe will not actually run. Please append a perf-changelog.yaml entry for dsv4-fp4-b300-sglang describing the new conc=12288 recipe and linking to this PR (mirroring the predecessor PR #1209 which added the conc=8192 entry).

Extended reasoning...

What the bug is and how it manifests

AGENTS.md is explicit that adding a new benchmark configuration requires updating perf-changelog.yaml. The section "Adding a New Benchmark Configuration" (AGENTS.md:192-196) lists "Add corresponding entry to perf-changelog.yaml to trigger benchmark" as step 2, the "Updating Docker Images" section (AGENTS.md:296) marks it as a MUST, and AGENTS.md:174 confirms "Changes to perf-changelog.yaml trigger benchmark runs." This PR adds a brand-new search-space row { tp: 8, ep: 8, dp-attn: true, conc-start: 12288, conc-end: 12288 } at .github/configs/nvidia-master.yaml:1886 plus the matching CONC=12288 branch in benchmarks/single_node/dsv4_fp4_b300_sglang.sh (with new tunings: CUDA_GRAPH_MAX_BS=1600, MAX_RUNNING_REQUESTS=12288, MEM_FRACTION_STATIC=0.72, SGLANG_OPT_USE_ONLINE_COMPRESS=1, etc.), but the changeset only touches those two files — perf-changelog.yaml is untouched.

The specific code path that triggers it

Per AGENTS.md, the sweep workflow on push-to-main keys off entries appended to perf-changelog.yaml to determine which configurations to (re)run. With no new changelog entry referencing dsv4-fp4-b300-sglang and the conc=12288 recipe, the workflow has no signal that there is anything new to benchmark, so the freshly added search-space row and script branch will sit dormant.

Why existing code doesn't prevent it

The repo does not have an automated check that cross-references new search-space rows in nvidia-master.yaml against perf-changelog.yaml entries. Compliance is enforced by convention/documentation (AGENTS.md) and prior practice. The directly analogous predecessor PR #1209 (commit a3330bb), which added the conc=8192 recipe to this exact same dsv4-fp4-b300-sglang config, did append a perf-changelog.yaml block (currently the last entry in the file) describing the new conc=8192 recipe and linking the PR — establishing the expected pattern that this PR omits.

Impact

The whole point of this PR is to start collecting performance numbers for the conc=12288 recipe. Without the perf-changelog.yaml entry, the recipe will not run on push-to-main and no benchmark numbers will be collected, defeating the purpose of the change. Users discovering the gap later will need a follow-up PR.

How to fix it

Append a new entry to perf-changelog.yaml (after the last entry, which is the conc=8192 entry from PR #1209) following the same shape: config-keys: [dsv4-fp4-b300-sglang], a short description such as "Add conc=12288 recipe for 1k1k on dsv4-fp4-b300-sglang", and a pr-link to PR #1225.

Step-by-step proof

  1. Inspect this PR's changeset: git show 38d849d --stat shows only two files touched — .github/configs/nvidia-master.yaml (+1 line) and benchmarks/single_node/dsv4_fp4_b300_sglang.sh (+17/-3). perf-changelog.yaml is not in the changeset.
  2. Inspect AGENTS.md:192-196 ("Adding a New Benchmark Configuration") — step 2 explicitly says "Add corresponding entry to perf-changelog.yaml to trigger benchmark."
  3. Inspect AGENTS.md:296 — "MUST: Add an entry to perf-changelog.yaml."
  4. Inspect AGENTS.md:174 — "Changes to perf-changelog.yaml trigger benchmark runs."
  5. Inspect predecessor PR sglang-update #1209 (commit a3330bb), which added the conc=8192 row to the same config: it appended a corresponding perf-changelog.yaml entry (now the last block in the file) referencing dsv4-fp4-b300-sglang and the conc=8192 recipe.
  6. Conclusion: the conc=12288 recipe added here will not be picked up on push-to-main because no new perf-changelog.yaml entry was added, and the analogous predecessor PR followed the documented protocol that this PR skips.

# - isl: 8192
# osl: 1024
# search-space:
# - { tp: 8, ep: 1, conc-start: 1, conc-end: 1 }
# - { tp: 4, ep: 1, conc-start: 32, conc-end: 32 }
# - { tp: 4, ep: 4, dp-attn: true, conc-start: 512, conc-end: 512 }
# - { tp: 8, ep: 8, dp-attn: true, conc-start: 2048, conc-end: 2048 }
# - { tp: 8, ep: 8, dp-attn: true, conc-start: 4096, conc-end: 4096 }

# DeepSeek-V4-Pro on B300 with EAGLE/MTP speculative decoding. Recipe is
# selected inside benchmarks/single_node/dsv4_fp4_b300_sglang_mtp.sh by
Expand Down
Loading
Loading