Skip to content

Add SGLANG_OPT_USE_MULTI_STREAM_OVERLAP=1 to SGLang DSv4 launch configs#1246

Open
yhyang201 wants to merge 2 commits intomainfrom
add-multi-stream-overlap-dsv4-sglang
Open

Add SGLANG_OPT_USE_MULTI_STREAM_OVERLAP=1 to SGLang DSv4 launch configs#1246
yhyang201 wants to merge 2 commits intomainfrom
add-multi-stream-overlap-dsv4-sglang

Conversation

@yhyang201
Copy link
Copy Markdown
Collaborator

Summary

  • Enable SGLANG_OPT_USE_MULTI_STREAM_OVERLAP=1 across all SGLang DeepSeek-V4 launch configurations
  • Single-node scripts: dsv4_fp4_b200.sh, dsv4_fp4_b300_sglang.sh, dsv4_fp4_b300_sglang_mtp.sh
  • Multi-node YAML recipes: conc1, conc512, conc512-20, conc1024, conc2048, conc16384 (both prefill and decode environments)

Test plan

  • Verify SGLang DSv4 single-node benchmarks launch successfully with the new env var
  • Verify multi-node disaggregated prefill/decode launches pick up the env var

🤖 Generated with Claude Code

…onfigs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@yhyang201 yhyang201 requested a review from a team May 1, 2026 04:06
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 1, 2026

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

export SGLANG_OPT_USE_JIT_INDEXER_METADATA=1
export SGLANG_OPT_USE_TOPK_V2=1
export SGLANG_OPT_USE_CUSTOM_ALL_REDUCE_V2=1
export SGLANG_OPT_USE_MULTI_STREAM_OVERLAP=1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 This PR enables SGLANG_OPT_USE_MULTI_STREAM_OVERLAP=1 across 9 SGLang DSv4 launch configs (a perf-affecting toggle) but does not add a corresponding entry to perf-changelog.yaml. AGENTS.md (Updating Docker Images section) lists this as a MUST for env-var/configuration changes so that run-sweep.yml is triggered on push-to-main and the perf delta is captured. Please add a perf-changelog entry covering dsv4-fp4-b200-sglang, dsv4-fp4-b300-sglang*, and the disaggregated GB300 SGLang configs (precedent: PR #1187 at perf-changelog.yaml:1896, which added an analogous entry for other SGLANG_OPT_* knobs).

Extended reasoning...

What is the bug

This PR adds SGLANG_OPT_USE_MULTI_STREAM_OVERLAP=1 to 9 SGLang DSv4 launch configs:

  • 3 single-node scripts: benchmarks/single_node/dsv4_fp4_b200.sh, dsv4_fp4_b300_sglang.sh, dsv4_fp4_b300_sglang_mtp.sh
  • 6 multi-node 8k1k YAML recipes (conc1, conc512, conc512-20, conc1024, conc2048, conc16384), in both prefill_environment and decode_environment blocks.

This is a performance-affecting environment toggle (multi-stream overlap), but no entry was added to perf-changelog.yaml.

Why this is a bug (the documented rule)

AGENTS.md explicitly couples env-var/configuration changes to a perf-changelog.yaml entry. The 'Updating Docker Images' section requires:

  1. Update any related environment variables or configuration parameters
  2. MUST: Add an entry to perf-changelog.yaml

AGENTS.md further documents that perf-changelog.yaml is the trigger for benchmark runs:

'perf-changelog.yaml triggers which configs to benchmark'
'Changes to perf-changelog.yaml trigger benchmark runs'

So without an entry here, the push-to-main run-sweep.yml workflow will not pick up the affected SGLang DSv4 configs, and the perf impact of enabling multi-stream overlap on these recipes will not be measured.

Precedent for env-var-only entries

PR #1187 (perf-changelog.yaml:1896) sets the precedent for env-var-only changes: its description states it 'Adds SGLANG_OPT_* env knobs (SWA_SPLIT_LEAF_ON_INSERT, USE_JIT_NORM, USE_JIT_INDEXER_METADATA, USE_TOPK_V2, USE_CUSTOM_ALL_REDUCE_V2)' — exactly analogous to this PR turning on SGLANG_OPT_USE_MULTI_STREAM_OVERLAP. PR #1209 added another env-var-only entry. The example block in AGENTS.md itself shows env-var-only descriptions (e.g. 'Add VLLM_MXFP4_USE_MARLIN=1 environment variable') with wildcard config-keys.

Step-by-step proof

  1. Read the PR diff: 9 files changed; each adds exactly one line: SGLANG_OPT_USE_MULTI_STREAM_OVERLAP=1 (or the YAML equivalent). No file outside benchmarks/ is modified.
  2. Read the tail of perf-changelog.yaml: the most recent entry references PR Add GB200 DSV4 Dynamo vLLM MTP2 recipes #1242 (matching the latest commit ef5dee4). There is no entry referencing PR Add SGLANG_OPT_USE_MULTI_STREAM_OVERLAP=1 to SGLang DSv4 launch configs #1246 or SGLANG_OPT_USE_MULTI_STREAM_OVERLAP.
  3. Cross-check AGENTS.md 'Updating Docker Images': step 3 is a MUST to add a perf-changelog.yaml entry whenever env vars / configuration parameters change.
  4. Therefore the PR violates a documented MUST.
  5. Operational consequence: when this lands on main, run-sweep.yml filters configs by perf-changelog.yaml diffs; the absence of a matching entry means the affected configs (dsv4-fp4-b200-sglang, dsv4-fp4-b300-sglang, dsv4-fp4-b300-sglang-mtp, and the GB300 disaggregated 8k1k SGLang configs) will not be re-benchmarked, so the perf delta from enabling multi-stream overlap is silently lost.

Impact

Process violation, not a runtime bug. No code path breaks at runtime; the harm is that the team loses the automated perf measurement for the very change this PR is trying to introduce, defeating the point of the toggle.

How to fix

Add a single entry to perf-changelog.yaml (modeled on the PR #1187 entry at line 1896) that lists the affected configs (or appropriate wildcards: dsv4-fp4-b200-sglang, dsv4-fp4-b300-sglang*, and the GB300 disaggregated SGLang 8k1k configs) with a description such as 'Add SGLANG_OPT_USE_MULTI_STREAM_OVERLAP=1 to SGLang DSv4 launch configs'.

…1246

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant