Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
77 commits
Select commit Hold shift + click to select a range
894b08e
[AMD] Add vLLM disaggregated prefill-decode benchmark for MI355X
chunfangamd Mar 11, 2026
1c4ad3d
[AMD] Refactor vLLM disagg recipe: models.yaml, UCX cleanup, QoS support
chunfangamd Mar 11, 2026
04ab30d
[AMD] Update vLLM disagg recipe for v0.17.1 NixlConnector API
chunfangamd Mar 11, 2026
99ce774
[AMD] Make vLLM disagg recipe CI-compatible (mia1 cluster)
chunfangamd Mar 12, 2026
d16bd21
[AMD] Co-locate vLLM disagg router with prefill on NODE_RANK=0
chunfangamd Mar 12, 2026
cf4b88c
[AMD] Use public vLLM base image with runtime dependency install
chunfangamd Mar 12, 2026
1b46ce5
[AMD] Enable Expert Parallelism with MoRI all-to-all on vLLM disagg d…
chunfangamd Mar 13, 2026
585ddb4
[AMD] Switch vLLM disagg KV transfer to MoRI-IO with protocol-aware p…
chunfangamd Mar 13, 2026
69fcdbd
[AMD] BUG fix: RANDOM_RANGE_RATIO never reaches bench.sh
ichbinblau Mar 17, 2026
d214e79
Bug fix: 1. With DRY_RUN=1, node 0 skipped starting proxy/prefill but…
ichbinblau Mar 17, 2026
3ffcc74
[AMD] Fix vLLM disagg hang: READ mode support + safety timeouts
chunfangamd Mar 19, 2026
9129ead
Adapt vLLM disagg recipe for 9N mia1 cluster (mlx5 NICs)
chunfangamd Mar 21, 2026
728f91a
[AMD] Fix vLLM disagg sweep hang: KV cache leak + benchmark client ha…
chunfangamd Mar 22, 2026
a163fd6
[AMD] Fix vLLM disagg Slurm job never terminating after benchmark com…
chunfangamd Mar 22, 2026
cb52c29
[AMD] Enable MoRI-IO READ mode by default for vLLM disagg
chunfangamd Mar 22, 2026
25a0310
[AMD] Fix CI checkout failure caused by root-owned __pycache__ files
chunfangamd Mar 22, 2026
5bbc954
[AMD] Fix CI checkout EACCES by redirecting Python bytecache off NFS
chunfangamd Mar 23, 2026
89ae516
[AMD] Fix KV reaper deadlock on high-ISL disagg workloads
chunfangamd Mar 23, 2026
f611f47
[AMD] Enable reading PREFILL_TP,PREFILL_EP,PREFILL_DP_ATTN,DECODE_TP,…
ichbinblau Mar 24, 2026
bec9c09
Merge branch 'main' into chun-oren-theresa/vllm_disagg
chunfangamd Mar 25, 2026
c7f0f05
[AMD] Upgrade vLLM disagg image from v0.17.1 to v0.18.0
chunfangamd Mar 29, 2026
b1c1a2c
Merge branch 'main' into amd/vllm_disagg_mvp_dev
chunfangamd Mar 29, 2026
800e4f9
[AMD] Add Kimi-K2.5-MXFP4 disagg inference config (1P2D)
chunfangamd Mar 30, 2026
b0cad67
feat: add Dockerfile and scripts for vLLM disaggregated server setup
mpashkovskii Apr 2, 2026
0fb2f33
feat: add MiniMax M2.5 PD disaggregation recipe (1P2D, MoRI-EP + MoRI…
chunfangamd Apr 3, 2026
13fe483
feat: add Dockerfile and runtime patch for MiniMax M2.5 WideEP + MoRI
chunfangamd Apr 3, 2026
db832b2
Fix: rename minimaxm25 to minimaxm2.5 for CI naming consistency
chunfangamd Apr 3, 2026
796307b
Optimize: add --gpu-memory-utilization 0.95 and --block-size 32 to Mi…
chunfangamd Apr 3, 2026
3d03d77
Fix: MiniMax M2.5 disagg — require EP=8 for prefill, fix ROCm gate dtype
chunfangamd Apr 3, 2026
aabc1f7
Remove unused docker/minimax-m25-disagg/ directory
chunfangamd Apr 3, 2026
64376a7
feat: add Dockerfile and scripts for vLLM disaggregated server setup
mpashkovskii Apr 2, 2026
a7b460d
fix: add broadcom nic drivers in Dockerfile
simondanielsson Apr 7, 2026
85c7aff
fix: remove moriio_proxy.sh and installation deps
simondanielsson Apr 7, 2026
e75395e
feat: bump vllm image version to 0.18.1
simondanielsson Apr 7, 2026
661efbc
fix: re-add necessary deps for mori (e.g. msgpack)
simondanielsson Apr 8, 2026
696f8cb
fix: convert run_P/D.sh scripts to use docker run
simondanielsson Apr 8, 2026
de88dae
fix: inherit model path from served model name by default
simondanielsson Apr 8, 2026
0558bb9
Merge branch 'amd/vllm_disagg_mvp_mpashkov' of github.com:mpashkovski…
simondanielsson Apr 8, 2026
ac39e32
feat: add vllm-router binary compatible with vllm v0.18 (router sha f…
simondanielsson Apr 8, 2026
c0dddeb
fix: remove duplicate install_mori_proxy_deps
simondanielsson Apr 8, 2026
35ba1eb
fix: build vllm-router in dockerfile
simondanielsson Apr 8, 2026
5c05fea
fix: re-add container creation sync
simondanielsson Apr 8, 2026
24df788
fix: remove hardcoded model names/paths
simondanielsson Apr 8, 2026
7d606a3
fix: update vllm-router binary (although unused)
simondanielsson Apr 8, 2026
03be6c2
docs: add build command to dockerfile
simondanielsson Apr 8, 2026
ab92ee3
fix: sweep concurrency
simondanielsson Apr 24, 2026
00e505a
fix: readd proxy
simondanielsson Apr 24, 2026
b21b208
fix: switch to nightly vllm image and correct router image
simondanielsson Apr 24, 2026
ccab17d
tmp: temporarily uncomment unneeded deps and assert mori already inst…
simondanielsson Apr 24, 2026
40997fa
revert: revert disable rdma notifioncations patch
simondanielsson Apr 24, 2026
ff3ed62
revert: moriio toy proxy tmp changes
simondanielsson Apr 24, 2026
28d74fc
fix: add missing runtime dep for vllm router into docker
simondanielsson Apr 24, 2026
f370707
chore: --rm containers after complete
simondanielsson Apr 24, 2026
2415dd9
fix: add --kv-connector moriio
simondanielsson Apr 24, 2026
2f31371
fix: update patches and exit 1 if apply fails
simondanielsson Apr 24, 2026
010e805
fix: remove manual mori install
simondanielsson Apr 24, 2026
26bac7b
chore: properly cleanup vllm-router
simondanielsson Apr 24, 2026
9eee397
Merge branch 'amd/vllm_disagg_mvp_dev' into amd/vllm_disagg_mvp_mpashkov
simondanielsson Apr 28, 2026
c62e5e5
fix: use --kv-connector moriio in vllm-router in job.slurm
simondanielsson Apr 28, 2026
55ac48c
fix: remove moriio_proxy.py
simondanielsson Apr 28, 2026
15aaba5
chore: update run_P/D.sh scripts to be compatible with amd_utils scripts
simondanielsson Apr 28, 2026
285d065
Update patch
simondanielsson Apr 28, 2026
ba91c69
Make leaner dockerfile
simondanielsson Apr 28, 2026
05a2982
Change model to gpt-oss
simondanielsson Apr 28, 2026
9917a7d
Set SOCKET_IFNAME's only if not present
simondanielsson Apr 28, 2026
be31abe
Update concurrency sweep
simondanielsson Apr 28, 2026
2543810
delete vllm-router
simondanielsson Apr 28, 2026
d8cc9a7
Revert unused bench
simondanielsson Apr 28, 2026
ad47ede
fix: revert changes in old folder vllm_disagg_utils
simondanielsson Apr 28, 2026
fbc60e5
fix: set kimi k2.5 and minimax M2.5 vllm images to nightly
simondanielsson Apr 28, 2026
a88fcac
tmp: Add gpt-oss sweep results
simondanielsson Apr 28, 2026
24b02df
rename results folder
simondanielsson Apr 28, 2026
53f70e2
fix: remove reduce_results=False from minimax patch
simondanielsson Apr 28, 2026
a556550
fix: update minimax patch to work with nightly
simondanielsson Apr 28, 2026
a9a63c7
Update run_P/D scripts to use minimax m2.5
simondanielsson Apr 28, 2026
6b29b39
Sweep concyurrencies
simondanielsson Apr 28, 2026
006dbc6
add minimax 1k/1k results
simondanielsson Apr 28, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/configs/amd-master.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -993,7 +993,7 @@ dsr1-fp8-mi355x-sglang-disagg-mtp:
- "DECODE_MTP_SIZE=2"

kimik2.5-fp4-mi355x-vllm-disagg:
image: vllm/vllm-openai-rocm:v0.18.0
image: vllm/vllm-openai-rocm:nightly-100c7b65e7579c8caf4ee0b04a6410b2796b905c
model: amd/Kimi-K2.5-MXFP4
model-prefix: kimik2.5
runner: mi355x-disagg
Expand Down Expand Up @@ -1046,7 +1046,7 @@ kimik2.5-fp4-mi355x-vllm-disagg:
- "DECODE_NODES=2"

minimaxm2.5-fp8-mi355x-vllm-disagg:
image: vllm/vllm-openai-rocm:v0.18.0
image: vllm/vllm-openai-rocm:nightly-100c7b65e7579c8caf4ee0b04a6410b2796b905c
model: MiniMaxAI/MiniMax-M2.5
model-prefix: minimaxm2.5
runner: mi355x-disagg
Expand Down
9 changes: 7 additions & 2 deletions benchmarks/multi_node/amd_utils/env.sh
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,13 @@ fi
export IBDEVICES

# Shared: Auto-detect default network interface (portable across clusters)
export GLOO_SOCKET_IFNAME=$(ip route | grep '^default' | awk '{print $5}' | head -n 1)
export NCCL_SOCKET_IFNAME=$(ip route | grep '^default' | awk '{print $5}' | head -n 1)
# Only auto-detect if not already set by the runner/environment
if [[ -z "$GLOO_SOCKET_IFNAME" ]]; then
export GLOO_SOCKET_IFNAME=$(ip route 2>/dev/null | grep '^default' | awk '{print $5}' | head -n 1)
fi
if [[ -z "$NCCL_SOCKET_IFNAME" ]]; then
export NCCL_SOCKET_IFNAME=$(ip route 2>/dev/null | grep '^default' | awk '{print $5}' | head -n 1)
fi

set +x

Expand Down
1 change: 1 addition & 0 deletions benchmarks/multi_node/amd_utils/job.slurm
Original file line number Diff line number Diff line change
Expand Up @@ -416,6 +416,7 @@ if [[ \"$ENGINE\" == \"vllm-disagg\" && \"$ROUTER_TYPE\" == \"vllm-router\" && \
\"$VLLM_ROUTER_IMAGE\" \
bash -lc \"mkdir -p /run_logs/slurm_job-${SLURM_JOB_ID} && exec vllm-router \
--vllm-pd-disaggregation \
--kv-connector moriio \
--vllm-discovery-address 0.0.0.0:${PROXY_PING_PORT} \
--port ${ROUTER_PORT} \
--host 0.0.0.0 \
Expand Down
327 changes: 0 additions & 327 deletions benchmarks/multi_node/amd_utils/moriio_proxy.py

This file was deleted.

4 changes: 2 additions & 2 deletions benchmarks/multi_node/amd_utils/patches/minimax_m2.py
Original file line number Diff line number Diff line change
Expand Up @@ -137,7 +137,6 @@ def __init__(
top_k=config.num_experts_per_tok,
hidden_size=config.hidden_size,
intermediate_size=config.intermediate_size,
reduce_results=False,
renormalize=True,
scoring_func=getattr(config, "scoring_func", "softmax"),
e_score_correction_bias=self.e_score_correction_bias,
Expand Down Expand Up @@ -185,7 +184,8 @@ def forward(self, hidden_states: torch.Tensor) -> torch.Tensor:
)
final_hidden_states = final_hidden_states[:num_tokens]
elif self.tp_size > 1:
final_hidden_states = self.experts.maybe_all_reduce_tensor_model_parallel(
from vllm.distributed.communication_op import tensor_model_parallel_all_reduce
final_hidden_states = tensor_model_parallel_all_reduce(
final_hidden_states
)

Expand Down
Loading