[NVIDIA] Add Kimi K2.5 NVFP4 GB200 disaggregated vLLM benchmarks via Dynamo#1008
[NVIDIA] Add Kimi K2.5 NVFP4 GB200 disaggregated vLLM benchmarks via Dynamo#1008functionstackx merged 2 commits intomainfrom
Conversation
nlevin-ui
commented
Apr 6, 2026
- New framework: dynamo-vllm (Dynamo frontend + vLLM backend)
- 6 configs: 2x 1k1k + 4x 8k1k with varying prefill/decode ratios
- Recipes sourced from NVIDIA/srt-slurm branch sa-submission-q2-2026
- Runner script updated to clone NVIDIA/srt-slurm and map vLLM container
- New framework: dynamo-vllm (Dynamo frontend + vLLM backend) - 6 configs: 2x 1k1k + 4x 8k1k with varying prefill/decode ratios - Recipes sourced from NVIDIA/srt-slurm branch sa-submission-q2-2026 - Runner script updated to clone NVIDIA/srt-slurm and map vLLM container Made-with: Cursor
4e4226e to
3e358a8
Compare
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you |
|
Claude finished @nlevin-ui's task in 1m 19s —— View job Review of PR #1008
LGTM — no blocking issues found. Validation summary:
|
The error now reports both model prefix and precision so users can identify which field is wrong. Made-with: Cursor
|
thanks @nlevin-ui can u debug the failing pr validation? |
|
@claude Verify that the parameters (EP, DP, TP, etc) in nvidia-master.yaml match those in the linked recipes from CONFIG_FILES. For all added entries. |
|
Claude finished @cquil11's task in 2m 18s —— View job Parameter verification: nvidia-master.yaml vs upstream recipes
All 6 configs match their upstream recipes. I fetched each recipe YAML from The mapping used: 1.
|
| Parameter | nvidia-master.yaml | Recipe |
|---|---|---|
| Prefill num-worker / tp / ep / dp-attn | 1 / 4 / 4 / true | workers=1, gpus=4, TP=1, DP=4, EP=true |
| Decode num-worker / tp / ep / dp-attn | 1 / 16 / 16 / true | workers=1, gpus=16, TP=1, DP=16, EP=true |
| conc-list | [256, 512, 1024, 2048, 3072, 4096] | 256x512x1024x2048x3072x4096 |
2. 1k1k/disagg-gb200-1p4d-dep4-tep4.yaml ✅
| Parameter | nvidia-master.yaml | Recipe |
|---|---|---|
| Prefill num-worker / tp / ep / dp-attn | 1 / 4 / 4 / true | workers=1, gpus=4, TP=1, DP=4, EP=true |
| Decode num-worker / tp / ep / dp-attn | 4 / 4 / 4 / false | workers=4, gpus=4, TP=4, EP=true (no DP) |
| conc-list | [4, 8, 16, 32, 64, 128] | 4x8x16x32x64x128 |
3. 8k1k/disagg-gb200-1p4d-dep4-tep4.yaml ✅
| Parameter | nvidia-master.yaml | Recipe |
|---|---|---|
| Prefill num-worker / tp / ep / dp-attn | 1 / 4 / 4 / true | workers=1, gpus=4, TP=1, DP=4, EP=true |
| Decode num-worker / tp / ep / dp-attn | 4 / 4 / 4 / false | workers=4, gpus=4, TP=4, EP=true (no DP) |
| conc-list | [4, 8, 16, 32, 128] | 4x8x16x32x128 |
Note: The 8k1k 1P4D concurrency list intentionally omits 64, matching the upstream recipe exactly. This confirms the recipe author's deliberate choice (as @nlevin-ui stated in the earlier review comment).
4. 8k1k/disagg-gb200-3p1d-dep4-dep16.yaml ✅
| Parameter | nvidia-master.yaml | Recipe |
|---|---|---|
| Prefill num-worker / tp / ep / dp-attn | 3 / 4 / 4 / true | workers=3, gpus=4, TP=1, DP=4, EP=true |
| Decode num-worker / tp / ep / dp-attn | 1 / 16 / 16 / true | workers=1, gpus=16, TP=1, DP=16, EP=true |
| conc-list | [512, 1024] | 512x1024 |
5. 8k1k/disagg-gb200-5p1d-dep4-dep8.yaml ✅
| Parameter | nvidia-master.yaml | Recipe |
|---|---|---|
| Prefill num-worker / tp / ep / dp-attn | 5 / 4 / 4 / true | workers=5, gpus=4, TP=1, DP=4, EP=true |
| Decode num-worker / tp / ep / dp-attn | 1 / 8 / 8 / true | workers=1, gpus=8, TP=1, DP=8, EP=true |
| conc-list | [2048] | 2048 |
6. 8k1k/disagg-gb200-6p1d-dep4-dep16.yaml ✅
| Parameter | nvidia-master.yaml | Recipe |
|---|---|---|
| Prefill num-worker / tp / ep / dp-attn | 6 / 4 / 4 / true | workers=6, gpus=4, TP=1, DP=4, EP=true |
| Decode num-worker / tp / ep / dp-attn | 1 / 16 / 16 / true | workers=1, gpus=16, TP=1, DP=16, EP=true |
| conc-list | [3072, 4096] | 3072x4096 |
Result: 6/6 configs verified — no mismatches found.
@functionstackx can you be more specific? I don't see PR validation failing beyond what I had already fixed prior to your comment. I might just be missing something obvious. |
nvm, i see that attempt #2 re-run has fixed it https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24058086986?pr=1008 thanks guys! |
functionstackx
left a comment
There was a problem hiding this comment.
lgtm passed pr validation https://github.com/SemiAnalysisAI/InferenceX/actions/runs/24058086986?pr=1008