Skip to content

Commit 2b21f34

Browse files
Natfiiclaude
andcommitted
chore(serve): bake flashinfer-autotune-off flag into serve-cute.sh
Per memory feedback_flashinfer_autotune_sm120, the SM120/GB10 host hard-reboots when flashinfer.jit's autotuner runs at serve startup (no clean OOM, no traceback, kernel-panic). Fix is universal: pass --kernel-config '{"enable_flashinfer_autotune":false}' to every vllm serve invocation in this repo. serve-cute.sh was missing it. serve.sh (triton_attn) is unaffected because it doesn't engage the cute_paged + flashinfer codepath. Refs: memory:feedback_flashinfer_autotune_sm120 Flashinfer issue vllm-project#2884, vLLM issue vllm-project#36999 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 76b88ba commit 2b21f34

1 file changed

Lines changed: 1 addition & 0 deletions

File tree

scripts/serve-cute.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,7 @@ docker run -d \
9696
--trust-remote-code \
9797
--gpu-memory-utilization "${SERVE_GPU_UTIL:-0.70}" \
9898
--max-num-batched-tokens 65536 \
99+
--kernel-config '{"enable_flashinfer_autotune":false}' \
99100
"${EXTRA_ARGS[@]}"
100101

101102
echo "Container started: $CONTAINER"

0 commit comments

Comments
 (0)