Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions .github/configs/nvidia-master.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1911,6 +1911,29 @@ glm5-fp4-b200-sglang:
- { tp: 8, ep: 1, conc-start: 4, conc-end: 4 }
- { tp: 4, ep: 1, conc-start: 4, conc-end: 256 }

# NOTE: At the time of submission, https://cookbook.sglang.io/autoregressive/GLM/GLM-5
# does not have a B300-specific recipe, so this config reuses the existing
# GLM-5 FP4 B200 SGLang recipe as-is until B300-specific tuning is available.
glm5-fp4-b300-sglang:
image: lmsysorg/sglang:v0.5.10.post1-cu130
model: nvidia/GLM-5-NVFP4
model-prefix: glm5
runner: b300
precision: fp4
framework: sglang
multinode: false
seq-len-configs:
- isl: 1024
osl: 1024
search-space:
- { tp: 8, ep: 1, conc-start: 4, conc-end: 4 }
- { tp: 4, ep: 1, conc-start: 4, conc-end: 256 }
- isl: 8192
osl: 1024
search-space:
- { tp: 8, ep: 1, conc-start: 4, conc-end: 4 }
- { tp: 4, ep: 1, conc-start: 4, conc-end: 256 }

qwen3.5-fp8-b200-sglang-mtp:
image: lmsysorg/sglang:v0.5.9-cu130
model: Qwen/Qwen3.5-397B-A17B-FP8
Expand Down
88 changes: 88 additions & 0 deletions benchmarks/single_node/glm5_fp4_b300.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
#!/usr/bin/env bash

# NOTE: At the time of submission, https://cookbook.sglang.io/autoregressive/GLM/GLM-5
# does not have a B300-specific recipe, so this script reuses the existing
# GLM-5 FP4 B200 SGLang recipe as-is until B300-specific tuning is available.

source "$(dirname "$0")/../benchmark_lib.sh"

check_env_vars \
MODEL \
TP \
CONC \
ISL \
OSL \
RANDOM_RANGE_RATIO \
RESULT_FILENAME \
EP_SIZE

if [[ -n "$SLURM_JOB_ID" ]]; then
echo "JOB $SLURM_JOB_ID running on $SLURMD_NODENAME"
fi

nvidia-smi

hf download "$MODEL"

SERVER_LOG=/workspace/server.log
PORT=${PORT:-8888}

echo "EP_SIZE: $EP_SIZE, CONC: $CONC, ISL: $ISL, OSL: $OSL"

EVAL_CONTEXT_ARGS=""
if [ "${EVAL_ONLY}" = "true" ]; then
setup_eval_context
EVAL_CONTEXT_ARGS="--context-length $EVAL_MAX_MODEL_LEN"
fi
# Start GPU monitoring (power, temperature, clocks every second)
start_gpu_monitor

set -x
PYTHONNOUSERSITE=1 python3 -m sglang.launch_server --model-path=$MODEL --host=0.0.0.0 --port=$PORT \
--trust-remote-code \
--tensor-parallel-size=$TP \
--data-parallel-size 1 --expert-parallel-size $EP_SIZE \
--disable-radix-cache \
--quantization modelopt_fp4 \
--kv-cache-dtype fp8_e4m3 \
--nsa-decode-backend trtllm \
--nsa-prefill-backend trtllm \
--moe-runner-backend flashinfer_trtllm \
--enable-flashinfer-allreduce-fusion \
--cuda-graph-max-bs 256 \
--max-prefill-tokens 32768 \
--chunked-prefill-size 32768 \
--mem-fraction-static 0.9 \
--stream-interval 30 \
--scheduler-recv-interval 10 \
--tokenizer-worker-num 6 \
--tokenizer-path $MODEL $EVAL_CONTEXT_ARGS > $SERVER_LOG 2>&1 &

SERVER_PID=$!

# Wait for server to be ready
wait_for_server_ready --port "$PORT" --server-log "$SERVER_LOG" --server-pid "$SERVER_PID"

pip install -q datasets pandas

run_benchmark_serving \
--model "$MODEL" \
--port "$PORT" \
--backend vllm \
--input-len "$ISL" \
--output-len "$OSL" \
--random-range-ratio "$RANDOM_RANGE_RATIO" \
--num-prompts "$((CONC * 10))" \
--max-concurrency "$CONC" \
--result-filename "$RESULT_FILENAME" \
--result-dir /workspace/

# After throughput, run evaluation only if RUN_EVAL is true
if [ "${RUN_EVAL}" = "true" ]; then
run_eval --framework lm-eval --port "$PORT"
append_lm_eval_summary
fi

# Stop GPU monitoring
stop_gpu_monitor
set +x
8 changes: 8 additions & 0 deletions perf-changelog.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1420,3 +1420,11 @@
- "Image: lmsysorg/sglang:v0.5.10.post1-cu130"
- "At the time of submission, https://cookbook.sglang.io/autoregressive/GLM/GLM-5.1 does not have a B300-specific recipe, so this reuses the existing GLM5 FP8 B200 SGLang recipe as-is"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1051

- config-keys:
- glm5-fp4-b300-sglang
description:
- "Add GLM-5 FP4 (NVFP4) B300 SGLang benchmark"
- "Image: lmsysorg/sglang:v0.5.10.post1-cu130"
- "At the time of submission, https://cookbook.sglang.io/autoregressive/GLM/GLM-5 does not have a B300-specific recipe, so this reuses the existing GLM-5 FP4 B200 SGLang recipe as-is"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1058
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The perf-changelog.yaml entry for glm5-fp4-b300-sglang has pr-link set to .../pull/XXXX instead of the actual PR number .../pull/1058. This should be corrected so the changelog permanently records the correct link.

Extended reasoning...

What the bug is and how it manifests

The perf-changelog.yaml entry for glm5-fp4-b300-sglang (the entry added by this PR) uses a placeholder value XXXX in the pr-link field instead of the actual PR number 1058. The field currently reads:

pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX

The specific code path that triggers it

The affected entry is the last entry in perf-changelog.yaml (line 1414), introduced by this PR. The PR diff itself shows the intended value as /pull/1058, but the code committed to main at commit 82d44b0 still contains the placeholder /XXXX.

Why existing code does not prevent it

There is no automated validation that checks perf-changelog.yaml pr-link fields for unreplaced placeholders like XXXX. The CI config validation would pass since the YAML is syntactically valid — the placeholder is just a wrong URL string.

Impact

The changelog will permanently record a broken link for this entry. Anyone referencing the changelog to find the PR that introduced the glm5-fp4-b300-sglang config will be directed to a non-existent GitHub URL instead of PR #1058.

How to fix it

Replace XXXX with 1058 in the pr-link field:

pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1058

Step-by-step proof

  1. This PR is numbered Add B300 config: glm5-fp4-sglang #1058, as shown in the PR metadata.
  2. The PR diff (hunk for perf-changelog.yaml) shows the added line as + pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1058.
  3. However, the current file at HEAD (commit 82d44b0) shows the last entry's pr-link as https://github.com/SemiAnalysisAI/InferenceX/pull/XXXX.
  4. The comparable dsr1-fp4-b300-sglang entry directly above (from PR Add B300 config: dsr1-fp4-sglang (non-MTP) #1049) correctly reads /pull/1049.
  5. Conclusion: the placeholder XXXX was not replaced before the commit landed on main.

Loading