I'm noticing that the server runs fine for --random-input-len 1024 --random-output-len 1024 --max-concurrency 8 but crashes for --random-input-len **4096** --random-output-len 1024 --max-concurrency 8
Error:
(EngineCore pid=34835) File "/app/ATOM/atom/plugin/attention.py", line 349, in build
(EngineCore pid=34835) query_lens_cpu[num_decodes + num_extends :].max().item()
(EngineCore pid=34835) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=34835) RuntimeError: max(): Expected reduction dim to be specified for input.numel() == 0. Specify the reduction dim with the 'dim' a
rgument.
Docker used:
rocm/atom-dev:vllm-latest
ATOM commit: 58af3e4
vLLM commit: 0.19.1.dev0+g2a69949bd.d20260420.rocm722
Machine used: mi355
logs_client.txt
logs_server.txt
Server Launch cmd:
export ATOM_ENABLE_QK_NORM_ROPE_CACHE_QUANT_FUSION=1
export VLLM_ROCM_USE_AITER=1
vllm serve /data/models/gpt-oss-120b/ -tp 1 --disable-uvicorn-access-log --no-enable-prefix-caching --port 8004 --kv-cache-dtype=fp8
Client Launch cmd:
vllm bench serve --model /data/models/gpt-oss-120b/ --dataset-name random --random-input-len 4096 --random-output-len 1024 --max-concurrency 8 --num-prompts 80 --percentile-metrics ttft,tpot,itl,e2el --metric-percentiles 99 --ignore-eos --temperature 0 --seed 0 --trust-remote-code
I'm noticing that the server runs fine for
--random-input-len 1024 --random-output-len 1024 --max-concurrency 8but crashes for--random-input-len **4096** --random-output-len 1024 --max-concurrency 8Error:
Docker used:
rocm/atom-dev:vllm-latestATOM commit: 58af3e4
vLLM commit: 0.19.1.dev0+g2a69949bd.d20260420.rocm722
Machine used: mi355
logs_client.txt
logs_server.txt
Server Launch cmd:
Client Launch cmd: