Conversation
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
1 similar comment
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
| - { tp: 4, ep: 1, conc-start: 32, conc-end: 32 } | ||
| - { tp: 4, ep: 4, dp-attn: true, conc-start: 512, conc-end: 512 } | ||
| - { tp: 8, ep: 8, dp-attn: true, conc-start: 8192, conc-end: 8192 } | ||
| - { tp: 8, ep: 8, dp-attn: true, conc-start: 12288, conc-end: 12288 } |
There was a problem hiding this comment.
🔴 This PR adds a new conc=12288 recipe to dsv4-fp4-b300-sglang (new search-space row in nvidia-master.yaml plus matching CONC=12288 branch in dsv4_fp4_b300_sglang.sh) but does not add a corresponding entry to perf-changelog.yaml. Per AGENTS.md, perf-changelog.yaml is the documented trigger for benchmark runs on push-to-main, so without an entry the new recipe will not actually run. Please append a perf-changelog.yaml entry for dsv4-fp4-b300-sglang describing the new conc=12288 recipe and linking to this PR (mirroring the predecessor PR #1209 which added the conc=8192 entry).
Extended reasoning...
What the bug is and how it manifests
AGENTS.md is explicit that adding a new benchmark configuration requires updating perf-changelog.yaml. The section "Adding a New Benchmark Configuration" (AGENTS.md:192-196) lists "Add corresponding entry to perf-changelog.yaml to trigger benchmark" as step 2, the "Updating Docker Images" section (AGENTS.md:296) marks it as a MUST, and AGENTS.md:174 confirms "Changes to perf-changelog.yaml trigger benchmark runs." This PR adds a brand-new search-space row { tp: 8, ep: 8, dp-attn: true, conc-start: 12288, conc-end: 12288 } at .github/configs/nvidia-master.yaml:1886 plus the matching CONC=12288 branch in benchmarks/single_node/dsv4_fp4_b300_sglang.sh (with new tunings: CUDA_GRAPH_MAX_BS=1600, MAX_RUNNING_REQUESTS=12288, MEM_FRACTION_STATIC=0.72, SGLANG_OPT_USE_ONLINE_COMPRESS=1, etc.), but the changeset only touches those two files — perf-changelog.yaml is untouched.
The specific code path that triggers it
Per AGENTS.md, the sweep workflow on push-to-main keys off entries appended to perf-changelog.yaml to determine which configurations to (re)run. With no new changelog entry referencing dsv4-fp4-b300-sglang and the conc=12288 recipe, the workflow has no signal that there is anything new to benchmark, so the freshly added search-space row and script branch will sit dormant.
Why existing code doesn't prevent it
The repo does not have an automated check that cross-references new search-space rows in nvidia-master.yaml against perf-changelog.yaml entries. Compliance is enforced by convention/documentation (AGENTS.md) and prior practice. The directly analogous predecessor PR #1209 (commit a3330bb), which added the conc=8192 recipe to this exact same dsv4-fp4-b300-sglang config, did append a perf-changelog.yaml block (currently the last entry in the file) describing the new conc=8192 recipe and linking the PR — establishing the expected pattern that this PR omits.
Impact
The whole point of this PR is to start collecting performance numbers for the conc=12288 recipe. Without the perf-changelog.yaml entry, the recipe will not run on push-to-main and no benchmark numbers will be collected, defeating the purpose of the change. Users discovering the gap later will need a follow-up PR.
How to fix it
Append a new entry to perf-changelog.yaml (after the last entry, which is the conc=8192 entry from PR #1209) following the same shape: config-keys: [dsv4-fp4-b300-sglang], a short description such as "Add conc=12288 recipe for 1k1k on dsv4-fp4-b300-sglang", and a pr-link to PR #1225.
Step-by-step proof
- Inspect this PR's changeset:
git show 38d849d --statshows only two files touched —.github/configs/nvidia-master.yaml(+1 line) andbenchmarks/single_node/dsv4_fp4_b300_sglang.sh(+17/-3).perf-changelog.yamlis not in the changeset. - Inspect AGENTS.md:192-196 ("Adding a New Benchmark Configuration") — step 2 explicitly says "Add corresponding entry to perf-changelog.yaml to trigger benchmark."
- Inspect AGENTS.md:296 — "MUST: Add an entry to perf-changelog.yaml."
- Inspect AGENTS.md:174 — "Changes to perf-changelog.yaml trigger benchmark runs."
- Inspect predecessor PR sglang-update #1209 (commit
a3330bb), which added the conc=8192 row to the same config: it appended a correspondingperf-changelog.yamlentry (now the last block in the file) referencingdsv4-fp4-b300-sglangand the conc=8192 recipe. - Conclusion: the conc=12288 recipe added here will not be picked up on push-to-main because no new
perf-changelog.yamlentry was added, and the analogous predecessor PR followed the documented protocol that this PR skips.
|
/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang |
|
@yhyang201 Kicking off a sweep. Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/25115567759 |
|
/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang |
|
@yhyang201 Kicking off a sweep. Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/25121580534 |
|
hang, maybe crush |
|
/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang |
|
@yhyang201 Kicking off a sweep. Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/25126029879 |
|
/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang |
|
@yhyang201 Kicking off a sweep. Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/25126267486 |
sglang-update