Skip to content

sglang-update#1225

Open
yhyang201 wants to merge 6 commits intomainfrom
sglang-update-conc12288
Open

sglang-update#1225
yhyang201 wants to merge 6 commits intomainfrom
sglang-update-conc12288

Conversation

@yhyang201
Copy link
Copy Markdown
Collaborator

sglang-update

@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

1 similar comment
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

- { tp: 4, ep: 1, conc-start: 32, conc-end: 32 }
- { tp: 4, ep: 4, dp-attn: true, conc-start: 512, conc-end: 512 }
- { tp: 8, ep: 8, dp-attn: true, conc-start: 8192, conc-end: 8192 }
- { tp: 8, ep: 8, dp-attn: true, conc-start: 12288, conc-end: 12288 }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 This PR adds a new conc=12288 recipe to dsv4-fp4-b300-sglang (new search-space row in nvidia-master.yaml plus matching CONC=12288 branch in dsv4_fp4_b300_sglang.sh) but does not add a corresponding entry to perf-changelog.yaml. Per AGENTS.md, perf-changelog.yaml is the documented trigger for benchmark runs on push-to-main, so without an entry the new recipe will not actually run. Please append a perf-changelog.yaml entry for dsv4-fp4-b300-sglang describing the new conc=12288 recipe and linking to this PR (mirroring the predecessor PR #1209 which added the conc=8192 entry).

Extended reasoning...

What the bug is and how it manifests

AGENTS.md is explicit that adding a new benchmark configuration requires updating perf-changelog.yaml. The section "Adding a New Benchmark Configuration" (AGENTS.md:192-196) lists "Add corresponding entry to perf-changelog.yaml to trigger benchmark" as step 2, the "Updating Docker Images" section (AGENTS.md:296) marks it as a MUST, and AGENTS.md:174 confirms "Changes to perf-changelog.yaml trigger benchmark runs." This PR adds a brand-new search-space row { tp: 8, ep: 8, dp-attn: true, conc-start: 12288, conc-end: 12288 } at .github/configs/nvidia-master.yaml:1886 plus the matching CONC=12288 branch in benchmarks/single_node/dsv4_fp4_b300_sglang.sh (with new tunings: CUDA_GRAPH_MAX_BS=1600, MAX_RUNNING_REQUESTS=12288, MEM_FRACTION_STATIC=0.72, SGLANG_OPT_USE_ONLINE_COMPRESS=1, etc.), but the changeset only touches those two files — perf-changelog.yaml is untouched.

The specific code path that triggers it

Per AGENTS.md, the sweep workflow on push-to-main keys off entries appended to perf-changelog.yaml to determine which configurations to (re)run. With no new changelog entry referencing dsv4-fp4-b300-sglang and the conc=12288 recipe, the workflow has no signal that there is anything new to benchmark, so the freshly added search-space row and script branch will sit dormant.

Why existing code doesn't prevent it

The repo does not have an automated check that cross-references new search-space rows in nvidia-master.yaml against perf-changelog.yaml entries. Compliance is enforced by convention/documentation (AGENTS.md) and prior practice. The directly analogous predecessor PR #1209 (commit a3330bb), which added the conc=8192 recipe to this exact same dsv4-fp4-b300-sglang config, did append a perf-changelog.yaml block (currently the last entry in the file) describing the new conc=8192 recipe and linking the PR — establishing the expected pattern that this PR omits.

Impact

The whole point of this PR is to start collecting performance numbers for the conc=12288 recipe. Without the perf-changelog.yaml entry, the recipe will not run on push-to-main and no benchmark numbers will be collected, defeating the purpose of the change. Users discovering the gap later will need a follow-up PR.

How to fix it

Append a new entry to perf-changelog.yaml (after the last entry, which is the conc=8192 entry from PR #1209) following the same shape: config-keys: [dsv4-fp4-b300-sglang], a short description such as "Add conc=12288 recipe for 1k1k on dsv4-fp4-b300-sglang", and a pr-link to PR #1225.

Step-by-step proof

  1. Inspect this PR's changeset: git show 38d849d --stat shows only two files touched — .github/configs/nvidia-master.yaml (+1 line) and benchmarks/single_node/dsv4_fp4_b300_sglang.sh (+17/-3). perf-changelog.yaml is not in the changeset.
  2. Inspect AGENTS.md:192-196 ("Adding a New Benchmark Configuration") — step 2 explicitly says "Add corresponding entry to perf-changelog.yaml to trigger benchmark."
  3. Inspect AGENTS.md:296 — "MUST: Add an entry to perf-changelog.yaml."
  4. Inspect AGENTS.md:174 — "Changes to perf-changelog.yaml trigger benchmark runs."
  5. Inspect predecessor PR sglang-update #1209 (commit a3330bb), which added the conc=8192 row to the same config: it appended a corresponding perf-changelog.yaml entry (now the last block in the file) referencing dsv4-fp4-b300-sglang and the conc=8192 recipe.
  6. Conclusion: the conc=12288 recipe added here will not be picked up on push-to-main because no new perf-changelog.yaml entry was added, and the analogous predecessor PR followed the documented protocol that this PR skips.

@yhyang201
Copy link
Copy Markdown
Collaborator Author

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang

@github-actions
Copy link
Copy Markdown
Contributor

@yhyang201 Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/25115567759
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang
Pinned ref: 3002f78
Approval: not required (trusted collaborator).

@yhyang201
Copy link
Copy Markdown
Collaborator Author

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang

@github-actions
Copy link
Copy Markdown
Contributor

@yhyang201 Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/25121580534
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang
Pinned ref: 3002f78
Approval: not required (trusted collaborator).

@yhyang201
Copy link
Copy Markdown
Collaborator Author

hang, maybe crush

@yhyang201
Copy link
Copy Markdown
Collaborator Author

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang

@github-actions
Copy link
Copy Markdown
Contributor

@yhyang201 Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/25126029879
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang
Pinned ref: 1489aae
Approval: not required (trusted collaborator).

@yhyang201
Copy link
Copy Markdown
Collaborator Author

/sweep test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang

@github-actions
Copy link
Copy Markdown
Contributor

@yhyang201 Kicking off a sweep.

Run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/25126267486
Command: test-config --config-files .github/configs/nvidia-master.yaml --config-keys dsv4-fp4-b300-sglang
Pinned ref: cec3e7e
Approval: not required (trusted collaborator).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant