Commit 2b21f34
chore(serve): bake flashinfer-autotune-off flag into serve-cute.sh
Per memory feedback_flashinfer_autotune_sm120, the SM120/GB10 host
hard-reboots when flashinfer.jit's autotuner runs at serve startup
(no clean OOM, no traceback, kernel-panic). Fix is universal: pass
--kernel-config '{"enable_flashinfer_autotune":false}' to every vllm
serve invocation in this repo.
serve-cute.sh was missing it. serve.sh (triton_attn) is unaffected
because it doesn't engage the cute_paged + flashinfer codepath.
Refs: memory:feedback_flashinfer_autotune_sm120
Flashinfer issue vllm-project#2884, vLLM issue vllm-project#36999
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent 76b88ba commit 2b21f34
1 file changed
Lines changed: 1 addition & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
96 | 96 | | |
97 | 97 | | |
98 | 98 | | |
| 99 | + | |
99 | 100 | | |
100 | 101 | | |
101 | 102 | | |
| |||
0 commit comments