[NVIDIA] Add Kimi K2.5 NVFP4 GB200 disaggregated vLLM benchmarks via Dynamo by nlevin-ui · Pull Request #1008 · SemiAnalysisAI/InferenceX

nlevin-ui · 2026-04-06T23:28:54Z

New framework: dynamo-vllm (Dynamo frontend + vLLM backend)
6 configs: 2x 1k1k + 4x 8k1k with varying prefill/decode ratios
Recipes sourced from NVIDIA/srt-slurm branch sa-submission-q2-2026
Runner script updated to clone NVIDIA/srt-slurm and map vLLM container

- New framework: dynamo-vllm (Dynamo frontend + vLLM backend) - 6 configs: 2x 1k1k + 4x 8k1k with varying prefill/decode ratios - Recipes sourced from NVIDIA/srt-slurm branch sa-submission-q2-2026 - Runner script updated to clone NVIDIA/srt-slurm and map vLLM container Made-with: Cursor

github-actions · 2026-04-06T23:41:33Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

claude · 2026-04-06T23:41:46Z

Claude finished @nlevin-ui's task in 1m 19s —— View job

Review of PR #1008

Gather context and read changed files
Validate model prefixes in config keys
Validate container image accessibility
Validate enroot import in launch script
Validate perf-changelog updated
Check benchmark script style
Post review summary

LGTM — no blocking issues found.

Validation summary:

Container image: vllm/vllm-openai:v0.18.0-cu130 is publicly accessible on Docker Hub ✅
Enroot import: Script already has enroot import -o $SQUASH_FILE docker://$IMAGE at line 58 ✅
Perf-changelog: Updated with detailed entry ✅
Model prefix: kimik2.5 is consistent with existing Kimi K2.5 configs already in nvidia-master.yaml ✅
Config structure: 6 disaggregated configs (2× 1k1k, 4× 8k1k) with varying prefill/decode ratios look well-structured ✅
Launch script: Clean branching for dynamo-vllm framework with correct srt-slurm repo (NVIDIA/srt-slurm) and branch (sa-submission-q2-2026) ✅
Container mapping: "${IMAGE}": ${SQUASH_FILE} at line 172 correctly maps the vLLM Docker image name so srt-slurm recipes can resolve it ✅

The error now reports both model prefix and precision so users can identify which field is wrong. Made-with: Cursor

functionstackx · 2026-04-07T03:19:07Z

thanks @nlevin-ui can u debug the failing pr validation?

cquil11 · 2026-04-07T14:39:10Z

@claude Verify that the parameters (EP, DP, TP, etc) in nvidia-master.yaml match those in the linked recipes from CONFIG_FILES. For all added entries.

Klaud-Cold · 2026-04-07T14:39:32Z

Claude finished @cquil11's task in 2m 18s —— View job

Parameter verification: nvidia-master.yaml vs upstream recipes

Read added entries in nvidia-master.yaml
Identify all 6 CONFIG_FILE recipe paths
Fetch each recipe from NVIDIA/srt-slurm branch sa-submission-q2-2026
Compare EP, DP, TP, num-worker, and concurrency parameters
Post verification summary

All 6 configs match their upstream recipes. I fetched each recipe YAML from NVIDIA/srt-slurm@sa-submission-q2-2026 and verified the parallelism parameters (num-worker, tp, ep, dp-attn) and concurrency lists against the nvidia-master.yaml entries.

The mapping used: tp ↔ gpus_per_prefill/gpus_per_decode, ep ↔ effective expert-parallel size (from data-parallel-size when using DEP, or from tensor-parallel-size when using TEP), num-worker ↔ prefill_workers/decode_workers, dp-attn ↔ whether data-parallel is active.

1. `1k1k/disagg-gb200-1p1d-dep4-dep16.yaml` ✅

Parameter	nvidia-master.yaml	Recipe
Prefill num-worker / tp / ep / dp-attn	1 / 4 / 4 / true	workers=1, gpus=4, TP=1, DP=4, EP=true
Decode num-worker / tp / ep / dp-attn	1 / 16 / 16 / true	workers=1, gpus=16, TP=1, DP=16, EP=true
conc-list	[256, 512, 1024, 2048, 3072, 4096]	`256x512x1024x2048x3072x4096`

2. `1k1k/disagg-gb200-1p4d-dep4-tep4.yaml` ✅

Parameter	nvidia-master.yaml	Recipe
Prefill num-worker / tp / ep / dp-attn	1 / 4 / 4 / true	workers=1, gpus=4, TP=1, DP=4, EP=true
Decode num-worker / tp / ep / dp-attn	4 / 4 / 4 / false	workers=4, gpus=4, TP=4, EP=true (no DP)
conc-list	[4, 8, 16, 32, 64, 128]	`4x8x16x32x64x128`

3. `8k1k/disagg-gb200-1p4d-dep4-tep4.yaml` ✅

Parameter	nvidia-master.yaml	Recipe
Prefill num-worker / tp / ep / dp-attn	1 / 4 / 4 / true	workers=1, gpus=4, TP=1, DP=4, EP=true
Decode num-worker / tp / ep / dp-attn	4 / 4 / 4 / false	workers=4, gpus=4, TP=4, EP=true (no DP)
conc-list	[4, 8, 16, 32, 128]	`4x8x16x32x128`

Note: The 8k1k 1P4D concurrency list intentionally omits 64, matching the upstream recipe exactly. This confirms the recipe author's deliberate choice (as @nlevin-ui stated in the earlier review comment).

4. `8k1k/disagg-gb200-3p1d-dep4-dep16.yaml` ✅

Parameter	nvidia-master.yaml	Recipe
Prefill num-worker / tp / ep / dp-attn	3 / 4 / 4 / true	workers=3, gpus=4, TP=1, DP=4, EP=true
Decode num-worker / tp / ep / dp-attn	1 / 16 / 16 / true	workers=1, gpus=16, TP=1, DP=16, EP=true
conc-list	[512, 1024]	`512x1024`

5. `8k1k/disagg-gb200-5p1d-dep4-dep8.yaml` ✅

Parameter	nvidia-master.yaml	Recipe
Prefill num-worker / tp / ep / dp-attn	5 / 4 / 4 / true	workers=5, gpus=4, TP=1, DP=4, EP=true
Decode num-worker / tp / ep / dp-attn	1 / 8 / 8 / true	workers=1, gpus=8, TP=1, DP=8, EP=true
conc-list	[2048]	`2048`

6. `8k1k/disagg-gb200-6p1d-dep4-dep16.yaml` ✅

Parameter	nvidia-master.yaml	Recipe
Prefill num-worker / tp / ep / dp-attn	6 / 4 / 4 / true	workers=6, gpus=4, TP=1, DP=4, EP=true
Decode num-worker / tp / ep / dp-attn	1 / 16 / 16 / true	workers=1, gpus=16, TP=1, DP=16, EP=true
conc-list	[3072, 4096]	`3072x4096`

Result: 6/6 configs verified — no mismatches found.

View job

nlevin-ui · 2026-04-07T17:40:24Z

thanks @nlevin-ui can u debug the failing pr validation?

@functionstackx can you be more specific? I don't see PR validation failing beyond what I had already fixed prior to your comment. I might just be missing something obvious.

functionstackx · 2026-04-07T19:30:20Z

thanks @nlevin-ui can u debug the failing pr validation?

@functionstackx can you be more specific? I don't see PR validation failing beyond what I had already fixed prior to your comment. I might just be missing something obvious.

nvm, i see that attempt #2 re-run has fixed it https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24058086986?pr=1008 thanks guys!

functionstackx

lgtm passed pr validation https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24058086986?pr=1008

github-project-automation Bot added this to InferenceMAX Board Apr 6, 2026

nlevin-ui self-assigned this Apr 6, 2026

nlevin-ui added the NVIDIA label Apr 6, 2026

nlevin-ui force-pushed the nv/kimi2.5-disagg-gb200-8k1k-1k1k branch from 4e4226e to 3e358a8 Compare April 6, 2026 23:34

nlevin-ui marked this pull request as ready for review April 6, 2026 23:41

nlevin-ui requested a review from a team April 6, 2026 23:41

nlevin-ui requested review from jgangani and kedarpotdar-nv as code owners April 6, 2026 23:41

claude Bot reviewed Apr 6, 2026

View reviewed changes

Comment thread runners/launch_gb200-nv.sh Outdated

Comment thread .github/configs/nvidia-master.yaml

Fix misleading error message for unsupported dynamo-vllm combinations

5123f80

The error now reports both model prefix and precision so users can identify which field is wrong. Made-with: Cursor

nlevin-ui added the sweep-enabled label Apr 7, 2026

kedarpotdar-nv approved these changes Apr 7, 2026

View reviewed changes

functionstackx approved these changes Apr 7, 2026

View reviewed changes

functionstackx merged commit 235b232 into main Apr 7, 2026
53 of 54 checks passed

functionstackx deleted the nv/kimi2.5-disagg-gb200-8k1k-1k1k branch April 7, 2026 19:33

github-project-automation Bot moved this to Done in InferenceMAX Board Apr 7, 2026

claude Bot mentioned this pull request Apr 8, 2026

[AMD] Upgrade GLM-5 SGLang mi35x image to 0.5.10 #1014

Merged

cquil11 changed the title ~~Add Kimi K2.5 NVFP4 GB200 disaggregated vLLM benchmarks via Dynamo~~ [NVIDIA] Add Kimi K2.5 NVFP4 GB200 disaggregated vLLM benchmarks via Dynamo Apr 8, 2026

cemigo114 mentioned this pull request Apr 17, 2026

[Feature Request] Add Kubernetes-native runner for distributed inference benchmarking (llm-d) #1045

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NVIDIA] Add Kimi K2.5 NVFP4 GB200 disaggregated vLLM benchmarks via Dynamo#1008

[NVIDIA] Add Kimi K2.5 NVFP4 GB200 disaggregated vLLM benchmarks via Dynamo#1008
functionstackx merged 2 commits intomainfrom
nv/kimi2.5-disagg-gb200-8k1k-1k1k

nlevin-ui commented Apr 6, 2026

Uh oh!

github-actions Bot commented Apr 6, 2026

Uh oh!

claude Bot commented Apr 6, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

functionstackx commented Apr 7, 2026

Uh oh!

cquil11 commented Apr 7, 2026

Uh oh!

Klaud-Cold commented Apr 7, 2026 •

edited

Loading

Uh oh!

nlevin-ui commented Apr 7, 2026

Uh oh!

functionstackx commented Apr 7, 2026

Uh oh!

functionstackx left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

nlevin-ui commented Apr 6, 2026

Uh oh!

github-actions Bot commented Apr 6, 2026

Uh oh!

claude Bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review of PR #1008

Uh oh!

Uh oh!

Uh oh!

functionstackx commented Apr 7, 2026

Uh oh!

cquil11 commented Apr 7, 2026

Uh oh!

Klaud-Cold commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Parameter verification: nvidia-master.yaml vs upstream recipes

1. 1k1k/disagg-gb200-1p1d-dep4-dep16.yaml ✅

2. 1k1k/disagg-gb200-1p4d-dep4-tep4.yaml ✅

3. 8k1k/disagg-gb200-1p4d-dep4-tep4.yaml ✅

4. 8k1k/disagg-gb200-3p1d-dep4-dep16.yaml ✅

5. 8k1k/disagg-gb200-5p1d-dep4-dep8.yaml ✅

6. 8k1k/disagg-gb200-6p1d-dep4-dep16.yaml ✅

Uh oh!

nlevin-ui commented Apr 7, 2026

Uh oh!

functionstackx commented Apr 7, 2026

Uh oh!

functionstackx left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

claude Bot commented Apr 6, 2026 •

edited

Loading

Klaud-Cold commented Apr 7, 2026 •

edited

Loading

1. `1k1k/disagg-gb200-1p1d-dep4-dep16.yaml` ✅

2. `1k1k/disagg-gb200-1p4d-dep4-tep4.yaml` ✅

3. `8k1k/disagg-gb200-1p4d-dep4-tep4.yaml` ✅

4. `8k1k/disagg-gb200-3p1d-dep4-dep16.yaml` ✅

5. `8k1k/disagg-gb200-5p1d-dep4-dep8.yaml` ✅

6. `8k1k/disagg-gb200-6p1d-dep4-dep16.yaml` ✅