perf: [Perf recipe] Change TP 16->32 for deepseek GB200 sync benchmark#1715
perf: [Perf recipe] Change TP 16->32 for deepseek GB200 sync benchmark#1715terrykong merged 1 commit intoNVIDIA-NeMo:mainfrom
Conversation
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
📝 WalkthroughWalkthroughA single YAML configuration file is updated to increase the tensor parallelism parameter for VLLM generation from 16 to 32 in a DeepSeek v3 performance recipe configuration. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
examples/configs/recipes/llm/performance/grpo-deepseek-v3-32n4g.yaml
🧰 Additional context used
📓 Path-based instructions (2)
examples/configs/recipes/**/*.yaml
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
When adding support for a new model, create a recipe YAML under examples/configs/recipes/ in the appropriate domain subdirectory (llm, vlm, etc.)
Files:
examples/configs/recipes/llm/performance/grpo-deepseek-v3-32n4g.yaml
!(**/tests/**|**/test_*.py|**/test_*.sh)
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Add the NVIDIA copyright header to all Python files and shell scripts (excluding tests). The header should include the current year
Files:
examples/configs/recipes/llm/performance/grpo-deepseek-v3-32n4g.yaml
🧠 Learnings (1)
📓 Common learnings
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: coderabbit-custom-pre-merge-checks-unique-id-file-non-traceable-F7F2B60C-1728-4C9A-8889-4F2235E186CA.txt:0-0
Timestamp: 2025-11-24T17:24:47.707Z
Learning: If a change could affect performance, the PR description should include before-and-after performance numbers, as well as the configuration and context in which they apply
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-11-24T17:24:41.976Z
Learning: Applies to examples/configs/recipes/llm/*.yaml : Recipe YAML files should follow the naming pattern: <algo>-<model>-<nodes>n<gpus>g-<strategy-and-params>[-modifiers][-long][.vN].yaml for LLM recipes
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-11-24T17:24:41.976Z
Learning: Applies to examples/configs/recipes/vlm/*.yaml : Recipe YAML files should follow the naming pattern: vlm_<algo>-<model>-<nodes>n<gpus>g-<strategy>[-modifiers][.vN].yaml for VLM recipes
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-11-24T17:24:41.976Z
Learning: Applies to examples/configs/recipes/**/*.yaml : When adding support for a new model, create a recipe YAML under examples/configs/recipes/ in the appropriate domain subdirectory (llm, vlm, etc.)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: build-container / main
- GitHub Check: sphinx-build / Build docs
- GitHub Check: Lint check
- GitHub Check: Lint check
- GitHub Check: Lint check
- GitHub Check: Post submodule check comment / Comment on PR
- GitHub Check: Post automodel integration comment / Comment on PR
#1715) Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
NVIDIA-NeMo#1715) Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: Parth Mannan <pmannan@nvidia.com>
NVIDIA-NeMo#1715) Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
NVIDIA-NeMo#1715) Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
#1715) Signed-off-by: Guyue Huang <guyueh@nvidia.com>
#1715) Signed-off-by: Guyue Huang <guyueh@nvidia.com>
#1715) Signed-off-by: Guyue Huang <guyueh@nvidia.com>
What does this PR do ?
There seems to be issues with optimizer offloading on GB200, so in the deepseek colocated benchmark, OOM happens in refit unexpectedly; I had to tune up the vLLM TP size to avoid OOM.
Issues
List issues that this PR closes (syntax):
Usage
# Add a code snippet demonstrating how to use thisBefore your PR is "Ready for review"
Pre checks:
Additional Information
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.