Add Qwen3.5-397B-A17B BF16 B200 SGLang benchmark (STP only)#704
Add Qwen3.5-397B-A17B BF16 B200 SGLang benchmark (STP only)#704functionstackx merged 2 commits intomainfrom
Conversation
Simple benchmark script without speculative decoding or MTP flags: - No SGLANG_USE_CUDA_IPC_TRANSPORT - No --reasoning-parser qwen3 - No --speculative-algo NEXTN or related flags - TP=8, --mem-fraction-static 0.8 - Image: lmsysorg/sglang:nightly-dev-20260216-d3bae71e Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>
|
@pr-claude review please |
|
@claude review please |
|
Claude finished @functionstackx's task in 3m 21s —— View job Review of PR #704 — Qwen3.5-397B-A17B FP8 B200 SGLang Benchmark
SummaryThe PR adds a new benchmark config for Qwen3.5-397B-A17B FP8 on B200 with SGLang. The config YAML and perf-changelog entries look correct structurally. The main concern is that the benchmark script is very bare compared to existing B200 SGLang scripts and is missing the Issues1. Missing
|
|
prob gonna need to an small refactor since it is bf16 |
|
@claude oops, the precision of the command i pasted is already bf16, can u change the file name to be bf16 and nvidia-master.yaml and perf changelog. yaml |
|
Claude finished @functionstackx's task in 1m 18s —— View job Renaming FP8 → BF16 across benchmark files
All three files updated and pushed in commit |
The benchmark script runs without --quantization fp8, so the model loads in BF16. Rename the script, config key, and changelog entry to accurately reflect the precision. - Rename benchmarks/qwen3.5_fp8_b200.sh → qwen3.5_bf16_b200.sh - Update nvidia-master.yaml config key and precision field - Update perf-changelog.yaml references and PR link Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>
|
@claude review plz |
|
Claude finished @functionstackx's task in 1m 40s —— View job Review of PR #704 — Qwen3.5-397B-A17B BF16 B200 SGLang Benchmark (STP only)
After the FP8→BF16 rename, the script now correctly represents what it does — launching SGLang without Issues1. Missing
|
--speculative-algo NEXTN --speculative-num-steps 3 @claude create/submit/open new PR from main that updates pr-claude-review.yml claude.yml and AGENT.md that says STP means single token prediction i.e. no spec decode or MTP |
|
Claude finished @functionstackx's task in 1m 58s —— View job Adding STP definition to project docs
DoneAdded STP/MTP terminology definitions to three files:
STP = Single Token Prediction (standard autoregressive decoding, no speculative decoding or MTP) |
Add Qwen3.5-397B-A17B FP8 B200 SGLang benchmark script and nvidia-master.yaml config STP only
benchmarks/qwen3.5_fp8_b200.shwith basic SGLang launch flagslmsysorg/sglang:nightly-dev-20260216-d3bae71eCloses #702
Generated with Claude Code