Add SGLANG_OPT_USE_MULTI_STREAM_OVERLAP=1 to SGLang DSv4 launch configs#1246
Add SGLANG_OPT_USE_MULTI_STREAM_OVERLAP=1 to SGLang DSv4 launch configs#1246
Conversation
…onfigs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
| export SGLANG_OPT_USE_JIT_INDEXER_METADATA=1 | ||
| export SGLANG_OPT_USE_TOPK_V2=1 | ||
| export SGLANG_OPT_USE_CUSTOM_ALL_REDUCE_V2=1 | ||
| export SGLANG_OPT_USE_MULTI_STREAM_OVERLAP=1 |
There was a problem hiding this comment.
🟡 This PR enables SGLANG_OPT_USE_MULTI_STREAM_OVERLAP=1 across 9 SGLang DSv4 launch configs (a perf-affecting toggle) but does not add a corresponding entry to perf-changelog.yaml. AGENTS.md (Updating Docker Images section) lists this as a MUST for env-var/configuration changes so that run-sweep.yml is triggered on push-to-main and the perf delta is captured. Please add a perf-changelog entry covering dsv4-fp4-b200-sglang, dsv4-fp4-b300-sglang*, and the disaggregated GB300 SGLang configs (precedent: PR #1187 at perf-changelog.yaml:1896, which added an analogous entry for other SGLANG_OPT_* knobs).
Extended reasoning...
What is the bug
This PR adds SGLANG_OPT_USE_MULTI_STREAM_OVERLAP=1 to 9 SGLang DSv4 launch configs:
- 3 single-node scripts:
benchmarks/single_node/dsv4_fp4_b200.sh,dsv4_fp4_b300_sglang.sh,dsv4_fp4_b300_sglang_mtp.sh - 6 multi-node 8k1k YAML recipes (
conc1,conc512,conc512-20,conc1024,conc2048,conc16384), in bothprefill_environmentanddecode_environmentblocks.
This is a performance-affecting environment toggle (multi-stream overlap), but no entry was added to perf-changelog.yaml.
Why this is a bug (the documented rule)
AGENTS.md explicitly couples env-var/configuration changes to a perf-changelog.yaml entry. The 'Updating Docker Images' section requires:
- Update any related environment variables or configuration parameters
- MUST: Add an entry to
perf-changelog.yaml
AGENTS.md further documents that perf-changelog.yaml is the trigger for benchmark runs:
'perf-changelog.yaml triggers which configs to benchmark'
'Changes to perf-changelog.yaml trigger benchmark runs'
So without an entry here, the push-to-main run-sweep.yml workflow will not pick up the affected SGLang DSv4 configs, and the perf impact of enabling multi-stream overlap on these recipes will not be measured.
Precedent for env-var-only entries
PR #1187 (perf-changelog.yaml:1896) sets the precedent for env-var-only changes: its description states it 'Adds SGLANG_OPT_* env knobs (SWA_SPLIT_LEAF_ON_INSERT, USE_JIT_NORM, USE_JIT_INDEXER_METADATA, USE_TOPK_V2, USE_CUSTOM_ALL_REDUCE_V2)' — exactly analogous to this PR turning on SGLANG_OPT_USE_MULTI_STREAM_OVERLAP. PR #1209 added another env-var-only entry. The example block in AGENTS.md itself shows env-var-only descriptions (e.g. 'Add VLLM_MXFP4_USE_MARLIN=1 environment variable') with wildcard config-keys.
Step-by-step proof
- Read the PR diff: 9 files changed; each adds exactly one line:
SGLANG_OPT_USE_MULTI_STREAM_OVERLAP=1(or the YAML equivalent). No file outsidebenchmarks/is modified. - Read the tail of
perf-changelog.yaml: the most recent entry references PR Add GB200 DSV4 Dynamo vLLM MTP2 recipes #1242 (matching the latest commitef5dee4). There is no entry referencing PR Add SGLANG_OPT_USE_MULTI_STREAM_OVERLAP=1 to SGLang DSv4 launch configs #1246 orSGLANG_OPT_USE_MULTI_STREAM_OVERLAP. - Cross-check AGENTS.md 'Updating Docker Images': step 3 is a MUST to add a
perf-changelog.yamlentry whenever env vars / configuration parameters change. - Therefore the PR violates a documented MUST.
- Operational consequence: when this lands on
main,run-sweep.ymlfilters configs byperf-changelog.yamldiffs; the absence of a matching entry means the affected configs (dsv4-fp4-b200-sglang,dsv4-fp4-b300-sglang,dsv4-fp4-b300-sglang-mtp, and the GB300 disaggregated 8k1k SGLang configs) will not be re-benchmarked, so the perf delta from enabling multi-stream overlap is silently lost.
Impact
Process violation, not a runtime bug. No code path breaks at runtime; the harm is that the team loses the automated perf measurement for the very change this PR is trying to introduce, defeating the point of the toggle.
How to fix
Add a single entry to perf-changelog.yaml (modeled on the PR #1187 entry at line 1896) that lists the affected configs (or appropriate wildcards: dsv4-fp4-b200-sglang, dsv4-fp4-b300-sglang*, and the GB300 disaggregated SGLang 8k1k configs) with a description such as 'Add SGLANG_OPT_USE_MULTI_STREAM_OVERLAP=1 to SGLang DSv4 launch configs'.
…1246 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
SGLANG_OPT_USE_MULTI_STREAM_OVERLAP=1across all SGLang DeepSeek-V4 launch configurationsdsv4_fp4_b200.sh,dsv4_fp4_b300_sglang.sh,dsv4_fp4_b300_sglang_mtp.shconc1,conc512,conc512-20,conc1024,conc2048,conc16384(both prefill and decode environments)Test plan
🤖 Generated with Claude Code