Skip to content

[NV] Add GB200 MegaMOE max throughput recipe#1218

Merged
Oseltamivir merged 2 commits intomainfrom
codex/inferencex-gb200-megamoe
Apr 29, 2026
Merged

[NV] Add GB200 MegaMOE max throughput recipe#1218
Oseltamivir merged 2 commits intomainfrom
codex/inferencex-gb200-megamoe

Conversation

@alec-flowers
Copy link
Copy Markdown
Collaborator

Summary

  • Add a GB200 Dynamo vLLM MegaMOE max-throughput srt-slurm recipe for DeepSeek-V4-Pro at conc=4096.
  • Wire the recipe into dsv4-fp4-gb200-dynamo-vllm as a fourth 8k/1k point alongside low-latency, mid-curve, and offload max-tpt.
  • Append a perf changelog entry describing the MegaMOE/no-offload runtime shape.

Validation

  • git diff --check
  • python3 utils/matrix_logic/generate_sweep_configs.py full-sweep --config-files .github/configs/nvidia-master.yaml --model-prefix dsv4 --precision fp4 --framework dynamo-vllm --runner-type gb200 --seq-lens 8k1k --multi-node --no-evals
  • python3 utils/process_changelog.py --base-ref origin/main --head-ref HEAD --changelog-file perf-changelog.yaml --trim-conc

@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

1 similar comment
@github-actions
Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

@alec-flowers alec-flowers force-pushed the codex/inferencex-gb200-megamoe branch from 9049ed8 to b63da6a Compare April 29, 2026 01:45
Comment thread perf-changelog.yaml Outdated
@alec-flowers
Copy link
Copy Markdown
Collaborator Author

total tok/s/GPU | 8957.06
tok/s/user | 16.41

Comment thread .github/configs/nvidia-master.yaml
@cquil11
Copy link
Copy Markdown
Collaborator

cquil11 commented Apr 29, 2026

@alec-flowers instead of just 3 data points, can you extend to highlight more parts of the curve so we are not overly relying on interpolation (which is imperfect in some cases)?

it is nice to have at least 6-7 points on the curve

@alec-flowers
Copy link
Copy Markdown
Collaborator Author

alec-flowers commented Apr 29, 2026

@alec-flowers instead of just 3 data points, can you extend to highlight more parts of the curve so we are not overly relying on interpolation (which is imperfect in some cases)?

it is nice to have at least 6-7 points on the curve

Yes for sure. But can we get that in in the coming days. The long pole here is this submission process and queuing and I had jobs that got edited and cancelled that had 6-7 points but that takes like 10 hours to run. So what I'm trying to do now is get a baseline and then sort of fill it in.

The goal is to get the full pareto and I'm working hard on it.

Comment thread .github/configs/nvidia-master.yaml
Copy link
Copy Markdown
Collaborator

@Oseltamivir Oseltamivir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@Oseltamivir Oseltamivir merged commit f444926 into main Apr 29, 2026
19 of 27 checks passed
@Oseltamivir Oseltamivir deleted the codex/inferencex-gb200-megamoe branch April 29, 2026 04:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

4 participants