Skip to content

add B200 Qwen SGLang BF16 #702

@functionstackx

Description

@functionstackx

@claude look at benchmarks/dsr1_fp8_b200.sh and nvidia-master.yaml for deepseek sglang fp8 b200 and then create an new benchmarks/.sh for qwen and then update nvidia-master.yaml

example command

PYTHONNOUSERSITE=1 python -m sglang.launch_server \
--model-path=$MODEL --host=0.0.0.0 --port=$PORT \
  --model Qwen/Qwen3.5-397B-A17B \
  --tp 8 \
  --mem-fraction-static 0.8

here is the recipe documentation link https://cookbook.sglang.io/autoregressive/Qwen/Qwen3.5#52-speed-benchmark. it says that the container should be lmsysorg/sglang:nightly-dev-20260216-d3bae71e

dont do stuff like this


Default: recv every ~10 requests; if CONC ≥ 16, relax to ~30 requests between scheduler recv polls.

if [[ $TP -eq 8 ]]; then
if [[ $CONC -ge 16 ]]; then
SCHEDULER_RECV_INTERVAL=30
else
SCHEDULER_RECV_INTERVAL=10
fi

Setting these values (passed in to --cuda-graph-max-bs and --max-running-requests) as the maximum concurrency

this will help us save memory from being unnecessary used.

MAX_RUNNING_REQUESTS=128
CUDA_GRAPH_MAX_BATCH_SIZE=128

MEM_FRAC_STATIC=0.82
CHUNKED_PREFILL_SIZE=32768
MAX_PREFILL_TOKENS=32768
elif [[ $TP -eq 4 ]]; then
if [[ $ISL -ne 8192 ]] || [[ $OSL -ne 1024 ]]; then
echo "TP=4 not yet supported for ISL=$ISL OSL=$OSL!"
exit 1
fi

Setting these values (passed in to --cuda-graph-max-bs and --max-running-requests) as the maximum concurrency

this will help us save memory from being unnecessary used.

MAX_RUNNING_REQUESTS=32
CUDA_GRAPH_MAX_BATCH_SIZE=32

MEM_FRAC_STATIC=0.95
CHUNKED_PREFILL_SIZE=8192
MAX_PREFILL_TOKENS=8192

SCHEDULER_RECV_INTERVAL=10
else
echo "Unrecognized TP size $TP!"
exit 1
fi
echo "SCHEDULER_RECV_INTERVAL: $SCHEDULER_RECV_INTERVAL, CONC: $CONC, ISL: $ISL, OSL: $OSL"

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Done

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions