Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
f4c7925
initial poc
cquil11 Nov 12, 2025
d068894
remove -d flag when launching docker container
cquil11 Nov 12, 2025
0a90d87
syntax error
cquil11 Nov 12, 2025
79c49cf
compatibility fixes
cquil11 Nov 12, 2025
de78e18
add correct endpoint prefix
cquil11 Nov 12, 2025
6a38bfb
remove reference env var
cquil11 Nov 12, 2025
f2e40ee
run vllm serve in background
cquil11 Nov 12, 2025
272c81d
unescape sequences
cquil11 Nov 12, 2025
4d27a9d
stop vllm to stdout after it stops
cquil11 Nov 12, 2025
5f549dc
stop vllm to stdout after it stops pt 2
cquil11 Nov 12, 2025
a3e7064
get rid of docker stop as no longer in detatched
cquil11 Nov 12, 2025
cabe362
clone bench serving to tmp dir
cquil11 Nov 12, 2025
3020926
clone bench serving to tmp dir pt 2
cquil11 Nov 12, 2025
142923b
add explanatory comment
cquil11 Nov 12, 2025
39fdbce
cleaning up
cquil11 Nov 12, 2025
6e10058
cleaning up
cquil11 Nov 13, 2025
710793d
adding mi355x refactor
cquil11 Nov 13, 2025
6b844a1
adding h200 initial refactor
cquil11 Nov 13, 2025
7ed5aa9
different way to see server logs
cquil11 Nov 13, 2025
fac8864
cleanup
cquil11 Nov 13, 2025
7c8f5a5
now fail if server fails
cquil11 Nov 13, 2025
9dd2b14
starting on b200
cquil11 Nov 13, 2025
69991c6
doign b200
cquil11 Nov 13, 2025
bdb3d39
reverting erroneous change
cquil11 Nov 13, 2025
a2a3db8
fixing b200
cquil11 Nov 14, 2025
f682f09
fixing b200 pt 2
cquil11 Nov 14, 2025
9eee9fd
updating mi300
cquil11 Nov 14, 2025
a692940
updating mi300 pt 2
cquil11 Nov 14, 2025
6869b36
updating mi300 pt 3 -- remove detached mode
cquil11 Nov 14, 2025
b8b6b46
cleaning up mi355x
cquil11 Nov 14, 2025
df64d8f
fixing mi300x and updating 325x
cquil11 Nov 14, 2025
def851c
reverting max conc to 512 on gptoss fp4 b200 docker
cquil11 Nov 14, 2025
ad97a0b
fixing mi300x and updating 325x
cquil11 Nov 14, 2025
284c11f
cleanng up
cquil11 Nov 14, 2025
5ca8d5b
add wait for h200 slurm dsr1
cquil11 Nov 14, 2025
6d300f4
max num seqs back to 512 for gptoss fpr b200 docker
cquil11 Nov 14, 2025
b026d28
fix port issue for dsr1 mi300x docker
cquil11 Nov 14, 2025
9e38f87
fix mi355x docker NUM_PROMPTS
cquil11 Nov 14, 2025
ee2105b
adding prop of failure for server logs
cquil11 Nov 14, 2025
9dd3e1a
add utils function for benchmark
cquil11 Nov 14, 2025
da39840
add utils function for benchmark
cquil11 Nov 14, 2025
ce96ec7
function-ize the waiting for server to start
cquil11 Nov 14, 2025
b02b77e
dont show arg parsing set -x
cquil11 Nov 14, 2025
01e8561
dont show arg parsing set +x oops
cquil11 Nov 14, 2025
c2c6c3c
dont show arg parsing set +x oops
cquil11 Nov 14, 2025
a56be97
capture server pid
cquil11 Nov 14, 2025
32a0d23
nebdius dont scancel
cquil11 Nov 17, 2025
2a085dc
changes to comments in benchmark lib . sh
cquil11 Nov 17, 2025
18cf708
Update benchmarks/dsr1_fp4_mi355x_docker.sh
cquil11 Nov 17, 2025
3809194
Update .github/workflows/benchmark-tmpl.yml
cquil11 Nov 17, 2025
0c75238
adding back whitespace
cquil11 Nov 17, 2025
6dad972
adding back whitespace
cquil11 Nov 17, 2025
80b2cbb
adding back whitespace
cquil11 Nov 17, 2025
227db82
remove tg launch script
cquil11 Nov 17, 2025
97edfee
Update benchmarks/gptoss_fp4_h100_docker.sh
cquil11 Nov 21, 2025
b6a80fb
Update benchmarks/dsr1_fp8_mi325x_docker.sh
cquil11 Nov 21, 2025
fa1e2c0
Update benchmarks/dsr1_fp8_mi355x_docker.sh
cquil11 Nov 21, 2025
a5ebc4a
Update benchmarks/gptoss_fp4_b200_trt_slurm.sh
cquil11 Nov 21, 2025
9ff641e
Audit and correct required environment variables documentation in all…
Copilot Nov 21, 2025
6eb8285
removing oci node rebase with main
cquil11 Nov 21, 2025
da48d92
Merge branch 'main' into refactor-docker-runner-launch
cquil11 Nov 21, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
214 changes: 214 additions & 0 deletions benchmarks/benchmark_lib.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,214 @@
#!/usr/bin/env bash

# Shared benchmarking utilities for InferenceMAX

# Wait for server to be ready by polling the health endpoint
# All parameters are required
# Parameters:
# --port: Server port
# --server-log: Path to server log file
# --server-pid: Server process ID (required)
# --sleep-interval: Sleep interval between health checks (optional, default: 5)
Comment on lines +6 to +11
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment states "All parameters are required" but --sleep-interval is listed as "(optional, default: 5)" in the parameter list. This is inconsistent. The comment should either say "Parameters (all required unless marked optional):" or the first line should be updated to reflect that not all parameters are required.

Copilot uses AI. Check for mistakes.
wait_for_server_ready() {
set +x
local port=""
local server_log=""
local server_pid=""
local sleep_interval=5

# Parse arguments
while [[ $# -gt 0 ]]; do
case $1 in
--port)
port="$2"
shift 2
;;
--server-log)
server_log="$2"
shift 2
;;
--server-pid)
server_pid="$2"
shift 2
;;
--sleep-interval)
sleep_interval="$2"
shift 2
;;
*)
echo "Unknown parameter: $1"
return 1
;;
esac
done

# Validate required parameters
if [[ -z "$port" ]]; then
echo "Error: --port is required"
return 1
fi
if [[ -z "$server_log" ]]; then
echo "Error: --server-log is required"
return 1
fi
if [[ -z "$server_pid" ]]; then
echo "Error: --server-pid is required"
return 1
fi

# Show logs until server is ready
tail -f "$server_log" &
local TAIL_PID=$!
until curl --output /dev/null --silent --fail http://0.0.0.0:$port/health; do
if ! kill -0 "$server_pid" 2>/dev/null; then
echo "Server died before becoming healthy. Exiting."
kill $TAIL_PID
exit 1
Comment thread
cquil11 marked this conversation as resolved.
fi
sleep "$sleep_interval"
done
kill $TAIL_PID
Comment thread
cquil11 marked this conversation as resolved.
Comment thread
cquil11 marked this conversation as resolved.
Comment on lines +60 to +70
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tail -f background process could become orphaned if the kill $TAIL_PID command fails. Consider using kill $TAIL_PID 2>/dev/null || true to ensure the function doesn't exit with an error status, or add a trap to ensure cleanup happens.

Copilot uses AI. Check for mistakes.
Comment on lines +65 to +70
Copy link

Copilot AI Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The kill $TAIL_PID command at line 65 and 70 should include error suppression (e.g., kill $TAIL_PID 2>/dev/null || true) to prevent the function from failing if the tail process has already terminated or doesn't exist. This is particularly important at line 65 where the script continues to exit 1 afterward.

Suggested change
kill $TAIL_PID
exit 1
fi
sleep "$sleep_interval"
done
kill $TAIL_PID
kill $TAIL_PID 2>/dev/null || true
exit 1
fi
sleep "$sleep_interval"
done
kill $TAIL_PID 2>/dev/null || true

Copilot uses AI. Check for mistakes.
}

# Run benchmark serving with standardized parameters
# All parameters are required
# Parameters:
# --model: Model name
# --port: Server port
# --backend: Backend type - e.g., 'vllm' or 'openai'
# --input-len: Random input sequence length
# --output-len: Random output sequence length
# --random-range-ratio: Random range ratio
# --num-prompts: Number of prompts
# --max-concurrency: Max concurrency
# --result-filename: Result filename without extension
# --result-dir: Result directory
run_benchmark_serving() {
Comment thread
cquil11 marked this conversation as resolved.
set +x
local model=""
local port=""
local backend=""
local input_len=""
local output_len=""
local random_range_ratio=""
local num_prompts=""
local max_concurrency=""
local result_filename=""
local result_dir=""

# Parse arguments
while [[ $# -gt 0 ]]; do
case $1 in
--model)
model="$2"
shift 2
;;
--port)
port="$2"
shift 2
;;
--backend)
backend="$2"
shift 2
;;
--input-len)
input_len="$2"
shift 2
;;
--output-len)
output_len="$2"
shift 2
;;
--random-range-ratio)
random_range_ratio="$2"
shift 2
;;
--num-prompts)
num_prompts="$2"
shift 2
;;
--max-concurrency)
max_concurrency="$2"
shift 2
;;
--result-filename)
result_filename="$2"
shift 2
;;
--result-dir)
result_dir="$2"
shift 2
;;
*)
echo "Unknown parameter: $1"
return 1
;;
esac
done

# Validate all required parameters
if [[ -z "$model" ]]; then
echo "Error: --model is required"
return 1
fi
if [[ -z "$port" ]]; then
echo "Error: --port is required"
return 1
fi
if [[ -z "$backend" ]]; then
echo "Error: --backend is required"
return 1
fi
if [[ -z "$input_len" ]]; then
echo "Error: --input-len is required"
return 1
fi
if [[ -z "$output_len" ]]; then
echo "Error: --output-len is required"
return 1
fi
if [[ -z "$random_range_ratio" ]]; then
echo "Error: --random-range-ratio is required"
return 1
fi
if [[ -z "$num_prompts" ]]; then
echo "Error: --num-prompts is required"
return 1
fi
if [[ -z "$max_concurrency" ]]; then
echo "Error: --max-concurrency is required"
return 1
fi
if [[ -z "$result_filename" ]]; then
echo "Error: --result-filename is required"
return 1
fi
if [[ -z "$result_dir" ]]; then
echo "Error: --result-dir is required"
return 1
fi

# Clone benchmark serving repo
local BENCH_SERVING_DIR=$(mktemp -d /tmp/bmk-XXXXXX)
Comment thread
cquil11 marked this conversation as resolved.
git clone https://github.com/kimbochen/bench_serving.git "$BENCH_SERVING_DIR"

# Run benchmark
set -x
python3 "$BENCH_SERVING_DIR/benchmark_serving.py" \
--model "$model" \
--backend "$backend" \
--base-url "http://0.0.0.0:$port" \
--dataset-name random \
--random-input-len "$input_len" \
--random-output-len "$output_len" \
--random-range-ratio "$random_range_ratio" \
--num-prompts "$num_prompts" \
--max-concurrency "$max_concurrency" \
--request-rate inf \
--ignore-eos \
--save-result \
--percentile-metrics 'ttft,tpot,itl,e2el' \
--result-dir "$result_dir" \
--result-filename "$result_filename.json"
set +x
}
Comment on lines +1 to +214
Copy link

Copilot AI Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wait_for_server_ready function disables set -x at line 13 but the run_benchmark_serving function also disables it at line 87. However, neither function re-enables set -x after completing, which could cause inconsistent debug output behavior for code that follows these function calls.

Copilot uses AI. Check for mistakes.
38 changes: 37 additions & 1 deletion benchmarks/dsr1_fp4_b200_docker.sh
Original file line number Diff line number Diff line change
@@ -1,11 +1,25 @@
#!/usr/bin/env bash

# === Required Env Vars ===
# MODEL
# PORT
# TP
# CONC
# ISL
# OSL
# RANDOM_RANGE_RATIO
# RESULT_FILENAME
# EP_SIZE
# NUM_PROMPTS

nvidia-smi

# To improve CI stability, we patch this helper function to prevent a race condition that
# happens 1% of the time. ref: https://github.com/flashinfer-ai/flashinfer/pull/1779
sed -i '102,108d' /usr/local/lib/python3.12/dist-packages/flashinfer/jit/cubin_loader.py

SERVER_LOG=$(mktemp /tmp/server-XXXXXX.log)

# Default: recv every ~10 requests; if CONC ≥ 16, relax to ~30 requests between scheduler recv polls.
if [[ $CONC -ge 16 ]]; then
SCHEDULER_RECV_INTERVAL=30
Expand All @@ -22,5 +36,27 @@ PYTHONNOUSERSITE=1 python3 -m sglang.launch_server --model-path $MODEL --host 0.
--cuda-graph-max-bs 256 --max-running-requests 256 --mem-fraction-static 0.85 --kv-cache-dtype fp8_e4m3 \
--chunked-prefill-size 16384 \
--ep-size $EP_SIZE --quantization modelopt_fp4 --enable-flashinfer-allreduce-fusion --scheduler-recv-interval $SCHEDULER_RECV_INTERVAL \
--enable-symm-mem --disable-radix-cache --attention-backend trtllm_mla --moe-runner-backend flashinfer_trtllm --stream-interval 10
--enable-symm-mem --disable-radix-cache --attention-backend trtllm_mla --moe-runner-backend flashinfer_trtllm --stream-interval 10 > $SERVER_LOG 2>&1 &

SERVER_PID=$!

# Source benchmark utilities
source "$(dirname "$0")/benchmark_lib.sh"

# Wait for server to be ready
wait_for_server_ready --port "$PORT" --server-log "$SERVER_LOG" --server-pid "$SERVER_PID"

pip install -q datasets pandas
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: maybe put the install command in benchmark_lib.sh?


run_benchmark_serving \
--model "$MODEL" \
--port "$PORT" \
--backend vllm \
--input-len "$ISL" \
--output-len "$OSL" \
--random-range-ratio "$RANDOM_RANGE_RATIO" \
--num-prompts "$NUM_PROMPTS" \
--max-concurrency "$CONC" \
--result-filename "$RESULT_FILENAME" \
--result-dir /workspace/

Copy link

Copilot AI Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Extra blank line at the end of the file (line 50). This should be removed for consistency.

Suggested change

Copilot uses AI. Check for mistakes.
49 changes: 22 additions & 27 deletions benchmarks/dsr1_fp4_b200_trt_slurm.sh
Original file line number Diff line number Diff line change
@@ -1,16 +1,13 @@
#!/usr/bin/env bash

# === Required Env Vars ===
# HF_TOKEN
# HF_HUB_CACHE
# IMAGE
# === Required Env Vars ===
# MODEL
# TP
# CONC
# ISL
# OSL
# MAX_MODEL_LEN
# RANDOM_RANGE_RATIO
# TP
# CONC
# RESULT_FILENAME
# PORT_OFFSET
# DP_ATTENTION
Expand Down Expand Up @@ -100,24 +97,22 @@ mpirun -n 1 --oversubscribe --allow-run-as-root \
--extra_llm_api_options=$EXTRA_CONFIG_FILE \
> $SERVER_LOG 2>&1 &


set +x
while IFS= read -r line; do
printf '%s\n' "$line"
if [[ "$line" == *"Application startup complete"* ]]; then
break
fi
done < <(tail -F -n0 "$SERVER_LOG")

git clone https://github.com/kimbochen/bench_serving.git
set -x
python3 bench_serving/benchmark_serving.py \
--model $MODEL --backend openai \
--base-url http://0.0.0.0:$PORT \
--dataset-name random \
--random-input-len $ISL --random-output-len $OSL --random-range-ratio $RANDOM_RANGE_RATIO \
--num-prompts $(( $CONC * 10 )) --max-concurrency $CONC \
--request-rate inf --ignore-eos \
--save-result --percentile-metrics 'ttft,tpot,itl,e2el' \
--result-dir /workspace/ \
--result-filename $RESULT_FILENAME.json
SERVER_PID=$!

# Source benchmark utilities
source "$(dirname "$0")/benchmark_lib.sh"

# Wait for server to be ready
wait_for_server_ready --port "$PORT" --server-log "$SERVER_LOG" --server-pid "$SERVER_PID"

run_benchmark_serving \
--model "$MODEL" \
--port "$PORT" \
--backend openai \
--input-len "$ISL" \
--output-len "$OSL" \
--random-range-ratio "$RANDOM_RANGE_RATIO" \
--num-prompts $(( $CONC * 10 )) \
--max-concurrency "$CONC" \
--result-filename "$RESULT_FILENAME" \
--result-dir /workspace/
36 changes: 29 additions & 7 deletions benchmarks/dsr1_fp4_mi355x_docker.sh
Original file line number Diff line number Diff line change
@@ -1,14 +1,15 @@
#!/usr/bin/env bash

# ========= Required Env Vars =========
# HF_TOKEN
# HF_HUB_CACHE
# === Required Env Vars ===
# MODEL
# MAX_MODEL_LEN
# RANDOM_RANGE_RATIO
# PORT
# TP
# CONC
# PORT
# ISL
# OSL
# RANDOM_RANGE_RATIO
# RESULT_FILENAME
# NUM_PROMPTS
export SGLANG_USE_AITER=1

PREFILL_SIZE=196608
Expand All @@ -18,6 +19,8 @@ if [[ "$ISL" == "8192" && "$OSL" == "1024" ]]; then
fi
fi

SERVER_LOG=$(mktemp /tmp/server-XXXXXX.log)

set -x
python3 -m sglang.launch_server --model-path=$MODEL --trust-remote-code \
--host=0.0.0.0 --port=$PORT \
Expand All @@ -27,5 +30,24 @@ python3 -m sglang.launch_server --model-path=$MODEL --trust-remote-code \
--disable-radix-cache \
--num-continuous-decode-steps=4 \
--max-prefill-tokens=$PREFILL_SIZE \
--cuda-graph-max-bs=128
--cuda-graph-max-bs=128 > $SERVER_LOG 2>&1 &

SERVER_PID=$!

# Source benchmark utilities
source "$(dirname "$0")/benchmark_lib.sh"

# Wait for server to be ready
wait_for_server_ready --port "$PORT" --server-log "$SERVER_LOG" --server-pid "$SERVER_PID"

run_benchmark_serving \
--model "$MODEL" \
--port "$PORT" \
--backend vllm \
--input-len "$ISL" \
--output-len "$OSL" \
--random-range-ratio "$RANDOM_RANGE_RATIO" \
--num-prompts "$NUM_PROMPTS" \
--max-concurrency "$CONC" \
--result-filename "$RESULT_FILENAME" \
--result-dir /workspace/
Loading
Loading