[NVIDIA] Update Qwen3.5 STP recipe on B200 by kedarpotdar-nv · Pull Request #758 · SemiAnalysisAI/InferenceX

kedarpotdar-nv · 2026-02-19T05:41:40Z

Summary

sglang cookbook WIP PR here - sgl-project/sgl-cookbook#169

Updates the Qwen3.5-397B-A17B BF16 SGLang benchmark launch configuration on B200 with optimized server parameters and environment tuning for improved performance.

Changes

Benchmark script (benchmarks/qwen3.5_bf16_b200.sh):

Environment variables: Add NCCL_NVLS_ENABLE=1, SGL_ENABLE_JIT_DEEPGEMM=false, SGLANG_ENABLE_FLASHINFER_GEMM=true, PYTHONUNBUFFERED=1
Attention & MOE backends: Switch to trtllm_mha attention backend and flashinfer_trtllm MOE runner
Memory tuning: Increase mem-fraction-static from 0.80 to 0.82
Prefill config: Add chunked-prefill-size and max-prefill-tokens (both 32768)
CUDA graphs: Set cuda-graph-max-bs to match concurrency level
Context length: Dynamically compute from ISL + OSL + 20
Scheduling: Add adaptive scheduler-recv-interval (10 for low-latency conc ≤8, 30 for max-throughput conc ≥16)
Other flags: Add --disable-radix-cache, --served-model-name, --trust-remote-code, --tokenizer-worker-num 6, --stream-interval 30, --max-running-requests 128, --enable-flashinfer-allreduce-fusion

Perf changelog (perf-changelog.yaml):

Add entry documenting the config changes for qwen3.5-bf16-b200-sglang

Image

lmsysorg/sglang:nightly-dev-20260216-d3bae71e (unchanged from initial config)

kedarpotdar-nv · 2026-02-19T07:27:09Z

@claude please write a PR description based on the diffs here

Klaud-Cold · 2026-02-19T07:27:30Z

Claude finished @kedarpotdar-nv's task in 40s —— View job

Writing PR Description

Gather context and read diffs
Analyze changes
Write PR description

PR description has been updated with a summary of all changes in this PR.

functionstackx

Mostly lgtm! Plz update https://github.com/sgl-project/sgl-cookbook before we can merge this PR since inferenceMax is about measuring the performance that end users can get. If these flags isn't documented then it is fully accessible to end Users

kedarpotdar-nv · 2026-02-19T16:54:57Z

Hi @functionstackx can we merge this? i will continue working on updating sgl recipe in the background with my team.

cquil11 · 2026-02-19T17:01:48Z

#760
will do this ASAP @kedarpotdar-nv

kedarpotdar-nv · 2026-02-19T18:20:24Z

PR WIP here - sgl-project/sgl-cookbook#169

functionstackx · 2026-02-19T20:30:18Z

thanks @kedarpotdar-nv

can u link the sglang qwen3.5 recipe to the nvidia recipes page too since this displays pretty high on google search https://docs.nvidia.com/llm-inference-quick-start-recipes/index.html

kedarpotdar-nv · 2026-02-19T22:26:27Z

looks like this needs a complex rebase

functionstackx · 2026-02-19T22:32:11Z

@kedarpotdar-nv can u try this prompt with claude?

kedarpotdar-nv · 2026-02-19T22:34:53Z

@claude can you move the benchmarks/qwen3.5_bf16_b200.sh to benchmarks/single_node? and look at the refactor on how to do it after recent refactor .. like ohter b200 sglang scripts in benchmarks/single_node .

Klaud-Cold · 2026-02-19T22:35:16Z

Claude finished @kedarpotdar-nv's task in 4m 50s —— View job

Moving and Refactoring `qwen3.5_bf16_b200.sh`

Gather context: read existing B200 SGLang scripts in benchmarks/single_node/ on main
Read current benchmarks/qwen3.5_bf16_b200.sh
Check config and runner script for path mapping
Refactor script to match the single_node pattern
Move file: delete old location, create at benchmarks/single_node/
Commit and push changes

Moved benchmarks/qwen3.5_bf16_b200.sh → benchmarks/single_node/qwen3.5_bf16_b200.sh and refactored to match the pattern of other B200 SGLang scripts (e.g., dsr1_fp8_b200.sh). Changes:

Source path: Updated to ../../benchmark_lib.sh (parent directory reference)
EP_SIZE: Added to check_env_vars and passed as --ep-size $EP_SIZE + --data-parallel-size=1 — consistent with how other B200 SGLang scripts handle it
Server launch format: Reformatted from multi-line \ continuation to inline style matching dsr1_fp8_b200.sh and dsr1_fp4_b200.sh
Comment style: Updated scheduler recv interval comment to match convention

All existing server parameters and optimizations from this PR are preserved (env vars, attention/MOE backends, memory tuning, prefill config, etc.).

Note: This branch still has other scripts in the flat benchmarks/ structure (pre-#666 reorganization). Those will need to be resolved during the rebase — this commit only addresses the qwen3.5 script.
· afb1d7d

Align with the multinode reorganization (PR #666): - Move script to benchmarks/single_node/ directory - Update source path to reference parent benchmark_lib.sh - Add EP_SIZE and data-parallel-size params for consistency - Reformat server launch command to match other B200 SGLang scripts Co-authored-by: Kedar Potdar <kedarpotdar-nv@users.noreply.github.com>

kedarpotdar-nv · 2026-02-20T01:51:28Z

this is good to merge @functionstackx . perf lining up after refactor.

functionstackx · 2026-02-20T02:44:14Z

Lgtm @kedarpotdar-nv

Ideally u can add more runners to the cluster first before merging as merge it before u do will cause an traffic jam

kedarpotdar-nv · 2026-02-20T02:51:47Z

@functionstackx multiple runners is being tested right now. @csahithi will open a PR shortly with improved cleanup logic to enable multiple runners. dont want to keep this held up for that.

kedarpotdar-nv · 2026-02-20T02:54:42Z

fwiw this full sweep only takes 2 hours.

kedarpotdar-nv · 2026-02-20T05:27:38Z

@functionstackx can you merge whenver? i have stopped minimax run for this to complete. shouldnt take long

update recipe

20edd7f

kedarpotdar-nv added NVIDIA sweep-enabled labels Feb 19, 2026

github-project-automation Bot added this to InferenceMAX Board Feb 19, 2026

kedarpotdar-nv added 4 commits February 18, 2026 21:42

add perf-changelog

7bcc9b6

temp: test on b200-nvs

9540610

add symmetric memory and revert runner name

27aa640

remove symmetric memory

8729d9d

kedarpotdar-nv changed the title ~~[DO NOT MERGE] Update Qwen3.5 STP recipe on B200~~ Update Qwen3.5 STP recipe on B200 Feb 19, 2026

typo

79f2bcf

functionstackx requested changes Feb 19, 2026

View reviewed changes

cquil11 approved these changes Feb 19, 2026

View reviewed changes

Merge branch 'main' into nv/qwen35-b200-update

a0f59ef

functionstackx approved these changes Feb 19, 2026

View reviewed changes

github-actions Bot and others added 2 commits February 19, 2026 22:40

Merge branch 'main' into nv/qwen35-b200-update

7d7e60c

kedarpotdar-nv self-assigned this Feb 19, 2026

Merge branch 'main' into nv/qwen35-b200-update

59431d3

newline to end of perf changelog

7f6a163

functionstackx merged commit 9dc87a6 into main Feb 20, 2026
9 of 25 checks passed

functionstackx deleted the nv/qwen35-b200-update branch February 20, 2026 05:32

github-project-automation Bot moved this to Done in InferenceMAX Board Feb 20, 2026

cquil11 changed the title ~~Update Qwen3.5 STP recipe on B200~~ [NVIDIA] Update Qwen3.5 STP recipe on B200 Apr 8, 2026

Conversation

kedarpotdar-nv commented Feb 19, 2026 • edited by functionstackx Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Image

Uh oh!

kedarpotdar-nv commented Feb 19, 2026

Uh oh!

Klaud-Cold commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Writing PR Description

Uh oh!

functionstackx left a comment

Choose a reason for hiding this comment

Uh oh!

kedarpotdar-nv commented Feb 19, 2026

Uh oh!

cquil11 commented Feb 19, 2026

Uh oh!

kedarpotdar-nv commented Feb 19, 2026

Uh oh!

functionstackx commented Feb 19, 2026

Uh oh!

kedarpotdar-nv commented Feb 19, 2026

Uh oh!

functionstackx commented Feb 19, 2026

Uh oh!

kedarpotdar-nv commented Feb 19, 2026

Uh oh!

Klaud-Cold commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Moving and Refactoring qwen3.5_bf16_b200.sh

Uh oh!

kedarpotdar-nv commented Feb 20, 2026

Uh oh!

functionstackx commented Feb 20, 2026

Uh oh!

kedarpotdar-nv commented Feb 20, 2026

Uh oh!

kedarpotdar-nv commented Feb 20, 2026

Uh oh!

kedarpotdar-nv commented Feb 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kedarpotdar-nv commented Feb 19, 2026 •

edited by functionstackx

Loading

Klaud-Cold commented Feb 19, 2026 •

edited

Loading

Klaud-Cold commented Feb 19, 2026 •

edited

Loading

Moving and Refactoring `qwen3.5_bf16_b200.sh`