[NV] Add GB200 MegaMOE max throughput recipe#1218
Conversation
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
1 similar comment
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. |
9049ed8 to
b63da6a
Compare
|
total tok/s/GPU | 8957.06 |
|
@alec-flowers instead of just 3 data points, can you extend to highlight more parts of the curve so we are not overly relying on interpolation (which is imperfect in some cases)? it is nice to have at least 6-7 points on the curve |
Yes for sure. But can we get that in in the coming days. The long pole here is this submission process and queuing and I had jobs that got edited and cancelled that had 6-7 points but that takes like 10 hours to run. So what I'm trying to do now is get a baseline and then sort of fill it in. The goal is to get the full pareto and I'm working hard on it. |
Summary
dsv4-fp4-gb200-dynamo-vllmas a fourth 8k/1k point alongside low-latency, mid-curve, and offload max-tpt.Validation
git diff --checkpython3 utils/matrix_logic/generate_sweep_configs.py full-sweep --config-files .github/configs/nvidia-master.yaml --model-prefix dsv4 --precision fp4 --framework dynamo-vllm --runner-type gb200 --seq-lens 8k1k --multi-node --no-evalspython3 utils/process_changelog.py --base-ref origin/main --head-ref HEAD --changelog-file perf-changelog.yaml --trim-conc