[NV] Add GB200 MegaMOE max throughput recipe by alec-flowers · Pull Request #1218 · SemiAnalysisAI/InferenceX

alec-flowers · 2026-04-29T01:44:58Z

Summary

Add a GB200 Dynamo vLLM MegaMOE max-throughput srt-slurm recipe for DeepSeek-V4-Pro at conc=4096.
Wire the recipe into dsv4-fp4-gb200-dynamo-vllm as a fourth 8k/1k point alongside low-latency, mid-curve, and offload max-tpt.
Append a perf changelog entry describing the MegaMOE/no-offload runtime shape.

Validation

git diff --check
python3 utils/matrix_logic/generate_sweep_configs.py full-sweep --config-files .github/configs/nvidia-master.yaml --model-prefix dsv4 --precision fp4 --framework dynamo-vllm --runner-type gb200 --seq-lens 8k1k --multi-node --no-evals
python3 utils/process_changelog.py --base-ref origin/main --head-ref HEAD --changelog-file perf-changelog.yaml --trim-conc

github-actions · 2026-04-29T01:45:06Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

github-actions · 2026-04-29T01:45:06Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

alec-flowers · 2026-04-29T03:28:10Z

total tok/s/GPU | 8957.06
tok/s/user | 16.41

cquil11 · 2026-04-29T03:42:04Z

@alec-flowers instead of just 3 data points, can you extend to highlight more parts of the curve so we are not overly relying on interpolation (which is imperfect in some cases)?

it is nice to have at least 6-7 points on the curve

alec-flowers · 2026-04-29T03:47:30Z

@alec-flowers instead of just 3 data points, can you extend to highlight more parts of the curve so we are not overly relying on interpolation (which is imperfect in some cases)?

it is nice to have at least 6-7 points on the curve

Yes for sure. But can we get that in in the coming days. The long pole here is this submission process and queuing and I had jobs that got edited and cancelled that had 6-7 points but that takes like 10 hours to run. So what I'm trying to do now is get a baseline and then sort of fill it in.

The goal is to get the full pareto and I'm working hard on it.

Oseltamivir

lgtm

alec-flowers requested a review from a team April 29, 2026 01:44

alec-flowers requested review from jgangani and kedarpotdar-nv as code owners April 29, 2026 01:44

github-project-automation Bot added this to InferenceMAX Board Apr 29, 2026

Add GB200 MegaMOE max throughput recipe

b63da6a

alec-flowers force-pushed the codex/inferencex-gb200-megamoe branch from 9049ed8 to b63da6a Compare April 29, 2026 01:45

claude Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread perf-changelog.yaml Outdated

alec-flowers added the full-sweep-enabled label Apr 29, 2026

kedarpotdar-nv approved these changes Apr 29, 2026

View reviewed changes

cquil11 reviewed Apr 29, 2026

View reviewed changes

Comment thread .github/configs/nvidia-master.yaml

cquil11 reviewed Apr 29, 2026

View reviewed changes

Comment thread benchmarks/multi_node/srt-slurm-recipes/vllm/deepseek-v4/8k1k/disagg-gb200-max-tpt-megamoe.yaml

Oseltamivir requested changes Apr 29, 2026

View reviewed changes

Comment thread .github/configs/nvidia-master.yaml

Add GB200 low-middle curve recipe

6b7317b

Oseltamivir approved these changes Apr 29, 2026

View reviewed changes

Oseltamivir merged commit f444926 into main Apr 29, 2026
19 of 27 checks passed

Oseltamivir deleted the codex/inferencex-gb200-megamoe branch April 29, 2026 04:41

github-project-automation Bot moved this to Done in InferenceMAX Board Apr 29, 2026

This was referenced Apr 29, 2026

LETS GO AMD!!! #1229

Closed

[AMD/ROCM] atom minimaxm2.5 fp4 on mi355x #1240

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NV] Add GB200 MegaMOE max throughput recipe#1218

[NV] Add GB200 MegaMOE max throughput recipe#1218
Oseltamivir merged 2 commits intomainfrom
codex/inferencex-gb200-megamoe

alec-flowers commented Apr 29, 2026

Uh oh!

github-actions Bot commented Apr 29, 2026

Uh oh!

github-actions Bot commented Apr 29, 2026

Uh oh!

Uh oh!

alec-flowers commented Apr 29, 2026

Uh oh!

Uh oh!

cquil11 commented Apr 29, 2026

Uh oh!

Uh oh!

alec-flowers commented Apr 29, 2026 •

edited

Loading

Uh oh!

Uh oh!

Oseltamivir left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

alec-flowers commented Apr 29, 2026

Summary

Validation

Uh oh!

github-actions Bot commented Apr 29, 2026

Uh oh!

github-actions Bot commented Apr 29, 2026

Uh oh!

Uh oh!

alec-flowers commented Apr 29, 2026

Uh oh!

Uh oh!

cquil11 commented Apr 29, 2026

Uh oh!

Uh oh!

alec-flowers commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Oseltamivir left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

alec-flowers commented Apr 29, 2026 •

edited

Loading