Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,7 @@ backend:
SGLANG_OPT_USE_JIT_NORM: "1"
SGLANG_OPT_USE_JIT_INDEXER_METADATA: "1"
SGLANG_OPT_USE_TOPK_V2: "1"
SGLANG_OPT_USE_MULTI_STREAM_OVERLAP: "1"
NCCL_MNNVL_ENABLE: "1"
NCCL_CUMEM_ENABLE: "1"
SGLANG_MOONCAKE_CUSTOM_MEM_POOL: "True"
Expand All @@ -104,6 +105,7 @@ backend:
SGLANG_OPT_USE_JIT_NORM: "1"
SGLANG_OPT_USE_JIT_INDEXER_METADATA: "1"
SGLANG_OPT_USE_TOPK_V2: "1"
SGLANG_OPT_USE_MULTI_STREAM_OVERLAP: "1"
NCCL_MNNVL_ENABLE: "1"
NCCL_CUMEM_ENABLE: "1"
SGLANG_MOONCAKE_CUSTOM_MEM_POOL: "True"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@ backend:
SGLANG_OPT_USE_JIT_INDEXER_METADATA: "1"
SGLANG_OPT_USE_TOPK_V2: "1"
SGLANG_OPT_USE_CUSTOM_ALL_REDUCE_V2: "1"
SGLANG_OPT_USE_MULTI_STREAM_OVERLAP: "1"
SGLANG_OPT_USE_DEEPGEMM_MEGA_MOE: "1"
SGLANG_OPT_FIX_HASH_MEGA_MOE: "1"
SGLANG_OPT_USE_FAST_MASK_EP: "1"
Expand Down Expand Up @@ -118,6 +119,7 @@ backend:
SGLANG_OPT_USE_JIT_NORM: "1"
SGLANG_OPT_USE_JIT_INDEXER_METADATA: "1"
SGLANG_OPT_USE_TOPK_V2: "1"
SGLANG_OPT_USE_MULTI_STREAM_OVERLAP: "1"
SGLANG_OPT_USE_DEEPGEMM_MEGA_MOE: "1"
SGLANG_OPT_FIX_HASH_MEGA_MOE: "1"
SGLANG_OPT_USE_FAST_MASK_EP: "1"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@ backend:
SGLANG_OPT_USE_JIT_INDEXER_METADATA: "1"
SGLANG_OPT_USE_TOPK_V2: "1"
SGLANG_OPT_USE_CUSTOM_ALL_REDUCE_V2: "1"
SGLANG_OPT_USE_MULTI_STREAM_OVERLAP: "1"
SGLANG_OPT_USE_DEEPGEMM_MEGA_MOE: "1"
SGLANG_OPT_FIX_HASH_MEGA_MOE: "1"
SGLANG_OPT_USE_FAST_MASK_EP: "1"
Expand Down Expand Up @@ -118,6 +119,7 @@ backend:
SGLANG_OPT_USE_JIT_NORM: "1"
SGLANG_OPT_USE_JIT_INDEXER_METADATA: "1"
SGLANG_OPT_USE_TOPK_V2: "1"
SGLANG_OPT_USE_MULTI_STREAM_OVERLAP: "1"
SGLANG_OPT_USE_DEEPGEMM_MEGA_MOE: "1"
SGLANG_OPT_FIX_HASH_MEGA_MOE: "1"
SGLANG_OPT_USE_FAST_MASK_EP: "1"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@ backend:
SGLANG_OPT_USE_JIT_INDEXER_METADATA: "1"
SGLANG_OPT_USE_TOPK_V2: "1"
SGLANG_OPT_USE_CUSTOM_ALL_REDUCE_V2: "1"
SGLANG_OPT_USE_MULTI_STREAM_OVERLAP: "1"
SGLANG_OPT_USE_DEEPGEMM_MEGA_MOE: "1"
SGLANG_OPT_FIX_HASH_MEGA_MOE: "1"
SGLANG_OPT_USE_FAST_MASK_EP: "1"
Expand Down Expand Up @@ -118,6 +119,7 @@ backend:
SGLANG_OPT_USE_JIT_NORM: "1"
SGLANG_OPT_USE_JIT_INDEXER_METADATA: "1"
SGLANG_OPT_USE_TOPK_V2: "1"
SGLANG_OPT_USE_MULTI_STREAM_OVERLAP: "1"
SGLANG_OPT_USE_DEEPGEMM_MEGA_MOE: "1"
SGLANG_OPT_FIX_HASH_MEGA_MOE: "1"
SGLANG_OPT_USE_FAST_MASK_EP: "1"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@ backend:
SGLANG_OPT_USE_JIT_INDEXER_METADATA: "1"
SGLANG_OPT_USE_TOPK_V2: "1"
SGLANG_OPT_USE_CUSTOM_ALL_REDUCE_V2: "1"
SGLANG_OPT_USE_MULTI_STREAM_OVERLAP: "1"
SGLANG_OPT_USE_DEEPGEMM_MEGA_MOE: "1"
SGLANG_OPT_FIX_HASH_MEGA_MOE: "1"
SGLANG_OPT_USE_FAST_MASK_EP: "1"
Expand Down Expand Up @@ -118,6 +119,7 @@ backend:
SGLANG_OPT_USE_JIT_NORM: "1"
SGLANG_OPT_USE_JIT_INDEXER_METADATA: "1"
SGLANG_OPT_USE_TOPK_V2: "1"
SGLANG_OPT_USE_MULTI_STREAM_OVERLAP: "1"
SGLANG_OPT_USE_DEEPGEMM_MEGA_MOE: "1"
SGLANG_OPT_FIX_HASH_MEGA_MOE: "1"
SGLANG_OPT_USE_FAST_MASK_EP: "1"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@ backend:
SGLANG_OPT_USE_JIT_INDEXER_METADATA: "1"
SGLANG_OPT_USE_TOPK_V2: "1"
SGLANG_OPT_USE_CUSTOM_ALL_REDUCE_V2: "1"
SGLANG_OPT_USE_MULTI_STREAM_OVERLAP: "1"
SGLANG_OPT_USE_DEEPGEMM_MEGA_MOE: "1"
SGLANG_OPT_FIX_HASH_MEGA_MOE: "1"
SGLANG_OPT_USE_FAST_MASK_EP: "1"
Expand Down Expand Up @@ -118,6 +119,7 @@ backend:
SGLANG_OPT_USE_JIT_NORM: "1"
SGLANG_OPT_USE_JIT_INDEXER_METADATA: "1"
SGLANG_OPT_USE_TOPK_V2: "1"
SGLANG_OPT_USE_MULTI_STREAM_OVERLAP: "1"
SGLANG_OPT_USE_DEEPGEMM_MEGA_MOE: "1"
SGLANG_OPT_FIX_HASH_MEGA_MOE: "1"
SGLANG_OPT_USE_FAST_MASK_EP: "1"
Expand Down
1 change: 1 addition & 0 deletions benchmarks/single_node/dsv4_fp4_b200.sh
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ export SGLANG_OPT_USE_JIT_NORM=1
export SGLANG_OPT_USE_JIT_INDEXER_METADATA=1
export SGLANG_OPT_USE_TOPK_V2=1
export SGLANG_OPT_USE_CUSTOM_ALL_REDUCE_V2=1
export SGLANG_OPT_USE_MULTI_STREAM_OVERLAP=1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 This PR enables SGLANG_OPT_USE_MULTI_STREAM_OVERLAP=1 across 9 SGLang DSv4 launch configs (a perf-affecting toggle) but does not add a corresponding entry to perf-changelog.yaml. AGENTS.md (Updating Docker Images section) lists this as a MUST for env-var/configuration changes so that run-sweep.yml is triggered on push-to-main and the perf delta is captured. Please add a perf-changelog entry covering dsv4-fp4-b200-sglang, dsv4-fp4-b300-sglang*, and the disaggregated GB300 SGLang configs (precedent: PR #1187 at perf-changelog.yaml:1896, which added an analogous entry for other SGLANG_OPT_* knobs).

Extended reasoning...

What is the bug

This PR adds SGLANG_OPT_USE_MULTI_STREAM_OVERLAP=1 to 9 SGLang DSv4 launch configs:

  • 3 single-node scripts: benchmarks/single_node/dsv4_fp4_b200.sh, dsv4_fp4_b300_sglang.sh, dsv4_fp4_b300_sglang_mtp.sh
  • 6 multi-node 8k1k YAML recipes (conc1, conc512, conc512-20, conc1024, conc2048, conc16384), in both prefill_environment and decode_environment blocks.

This is a performance-affecting environment toggle (multi-stream overlap), but no entry was added to perf-changelog.yaml.

Why this is a bug (the documented rule)

AGENTS.md explicitly couples env-var/configuration changes to a perf-changelog.yaml entry. The 'Updating Docker Images' section requires:

  1. Update any related environment variables or configuration parameters
  2. MUST: Add an entry to perf-changelog.yaml

AGENTS.md further documents that perf-changelog.yaml is the trigger for benchmark runs:

'perf-changelog.yaml triggers which configs to benchmark'
'Changes to perf-changelog.yaml trigger benchmark runs'

So without an entry here, the push-to-main run-sweep.yml workflow will not pick up the affected SGLang DSv4 configs, and the perf impact of enabling multi-stream overlap on these recipes will not be measured.

Precedent for env-var-only entries

PR #1187 (perf-changelog.yaml:1896) sets the precedent for env-var-only changes: its description states it 'Adds SGLANG_OPT_* env knobs (SWA_SPLIT_LEAF_ON_INSERT, USE_JIT_NORM, USE_JIT_INDEXER_METADATA, USE_TOPK_V2, USE_CUSTOM_ALL_REDUCE_V2)' — exactly analogous to this PR turning on SGLANG_OPT_USE_MULTI_STREAM_OVERLAP. PR #1209 added another env-var-only entry. The example block in AGENTS.md itself shows env-var-only descriptions (e.g. 'Add VLLM_MXFP4_USE_MARLIN=1 environment variable') with wildcard config-keys.

Step-by-step proof

  1. Read the PR diff: 9 files changed; each adds exactly one line: SGLANG_OPT_USE_MULTI_STREAM_OVERLAP=1 (or the YAML equivalent). No file outside benchmarks/ is modified.
  2. Read the tail of perf-changelog.yaml: the most recent entry references PR Add GB200 DSV4 Dynamo vLLM MTP2 recipes #1242 (matching the latest commit ef5dee4). There is no entry referencing PR Add SGLANG_OPT_USE_MULTI_STREAM_OVERLAP=1 to SGLang DSv4 launch configs #1246 or SGLANG_OPT_USE_MULTI_STREAM_OVERLAP.
  3. Cross-check AGENTS.md 'Updating Docker Images': step 3 is a MUST to add a perf-changelog.yaml entry whenever env vars / configuration parameters change.
  4. Therefore the PR violates a documented MUST.
  5. Operational consequence: when this lands on main, run-sweep.yml filters configs by perf-changelog.yaml diffs; the absence of a matching entry means the affected configs (dsv4-fp4-b200-sglang, dsv4-fp4-b300-sglang, dsv4-fp4-b300-sglang-mtp, and the GB300 disaggregated 8k1k SGLang configs) will not be re-benchmarked, so the perf delta from enabling multi-stream overlap is silently lost.

Impact

Process violation, not a runtime bug. No code path breaks at runtime; the harm is that the team loses the automated perf measurement for the very change this PR is trying to introduce, defeating the point of the toggle.

How to fix

Add a single entry to perf-changelog.yaml (modeled on the PR #1187 entry at line 1896) that lists the affected configs (or appropriate wildcards: dsv4-fp4-b200-sglang, dsv4-fp4-b300-sglang*, and the GB300 disaggregated SGLang 8k1k configs) with a description such as 'Add SGLANG_OPT_USE_MULTI_STREAM_OVERLAP=1 to SGLang DSv4 launch configs'.


# TODO(Cam): the lmsysorg/sglang:deepseek-v4-blackwell image installs sglang
# editable at /workspace/sglang/python; prior sglang tags used /sgl-workspace/sglang.
Expand Down
1 change: 1 addition & 0 deletions benchmarks/single_node/dsv4_fp4_b300_sglang.sh
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ export SGLANG_OPT_USE_JIT_NORM=1
export SGLANG_OPT_USE_JIT_INDEXER_METADATA=1
export SGLANG_OPT_USE_TOPK_V2=1
export SGLANG_OPT_USE_CUSTOM_ALL_REDUCE_V2=1
export SGLANG_OPT_USE_MULTI_STREAM_OVERLAP=1

# TODO(Cam): the deepseek-v4 sglang images install sglang editable at
# /workspace/sglang/python; prior sglang tags used /sgl-workspace/sglang.
Expand Down
1 change: 1 addition & 0 deletions benchmarks/single_node/dsv4_fp4_b300_sglang_mtp.sh
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ export SGLANG_OPT_USE_JIT_NORM=1
export SGLANG_OPT_USE_JIT_INDEXER_METADATA=1
export SGLANG_OPT_USE_TOPK_V2=1
export SGLANG_OPT_USE_CUSTOM_ALL_REDUCE_V2=1
export SGLANG_OPT_USE_MULTI_STREAM_OVERLAP=1

# TODO(Cam): the deepseek-v4 sglang images install sglang editable at
# /workspace/sglang/python; prior sglang tags used /sgl-workspace/sglang.
Expand Down
11 changes: 11 additions & 0 deletions perf-changelog.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2063,3 +2063,14 @@
- "Recipes cover 8k/1k aggregate TP8 low-latency conc=1, low-latency bridge 1P DEP8 + 4D TP8 no-offload conc=16/32/64, mid 1P/1D DEP8 MegaMOE conc=128, and high-throughput 2P/1D DEP8 MegaMOE conc=1024"
- "All recipes enable FP4 indexer cache and speculative-config mtp with num_speculative_tokens=2"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1242

- config-keys:
- dsv4-fp4-b200-sglang
- dsv4-fp4-b300-sglang
- dsv4-fp4-b300-sglang-mtp
- dsv4-fp4-gb300-dynamo-sglang
description:
- "Add SGLANG_OPT_USE_MULTI_STREAM_OVERLAP=1 to all SGLang DeepSeek-V4 launch configurations"
- "Single-node: dsv4_fp4_b200.sh, dsv4_fp4_b300_sglang.sh, dsv4_fp4_b300_sglang_mtp.sh"
- "Multi-node: conc1, conc512, conc512-20, conc1024, conc2048, conc16384 (both prefill and decode environments)"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1246
Loading