cp: feat: Onboard perf recipes in tests (1322) into r0.4.0#1497
cp: feat: Onboard perf recipes in tests (1322) into r0.4.0#1497
feat: Onboard perf recipes in tests (1322) into r0.4.0#1497Conversation
Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com> Signed-off-by: Zhiyu Li <zhiyul@nvidia.com> Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com> Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: Anna Shors <ashors@nvidia.com> Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com> Signed-off-by: Parth Chadha <pchadha@nvidia.com> Signed-off-by: Guyue Huang <140554423+guyueh1@users.noreply.github.com> Co-authored-by: Youngeun Kwon <youngeunk@nvidia.com> Co-authored-by: Zhiyu Li <zhiyul@NVIDIA.com> Co-authored-by: Terry Kong <terrycurtiskong@gmail.com> Co-authored-by: Anna Shors <ashors@nvidia.com> Co-authored-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com> Co-authored-by: Parth Chadha <pchadha@nvidia.com> Co-authored-by: Terry Kong <terryk@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
📝 WalkthroughWalkthroughThis PR adds GRPO performance configurations and test suites for multiple LLM variants including DeepSeek v3, LLaMA 3.1, and Qwen3 models, featuring both standard and asynchronous training setups. Includes configuration files specifying training hyperparameters, model parallelism, hardware allocation, and logging, along with bash test scripts for performance validation and test infrastructure integration. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.yaml (1)
27-27: Add trailing newline.YAML files should end with a trailing newline per standard conventions.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (23)
examples/configs/recipes/llm/performance/grpo-deepseek-v3-32n8g.yaml(1 hunks)examples/configs/recipes/llm/performance/grpo-deepseek-v3-64n8g-async-1off.yaml(1 hunks)examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.yaml(1 hunks)examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g.yaml(1 hunks)examples/configs/recipes/llm/performance/grpo-qwen3-235b-16n8g.yaml(1 hunks)examples/configs/recipes/llm/performance/grpo-qwen3-235b-32n8g-async-1off.yaml(1 hunks)examples/configs/recipes/llm/performance/grpo-qwen3-30ba3b-4n8g-async-1off.yaml(1 hunks)examples/configs/recipes/llm/performance/grpo-qwen3-30ba3b-4n8g.yaml(1 hunks)examples/configs/recipes/llm/performance/grpo-qwen3-32b-4n8g.yaml(1 hunks)examples/configs/recipes/llm/performance/grpo-qwen3-32b-8n8g-async-1off.yaml(1 hunks)tests/test_suites/llm/performance/common.env(1 hunks)tests/test_suites/llm/performance/grpo-deepseek-v3-32n8g.sh(1 hunks)tests/test_suites/llm/performance/grpo-deepseek-v3-64n8g-async-1off.sh(1 hunks)tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.sh(1 hunks)tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g.sh(1 hunks)tests/test_suites/llm/performance/grpo-qwen3-235b-16n8g.sh(1 hunks)tests/test_suites/llm/performance/grpo-qwen3-235b-32n8g-async-1off.sh(1 hunks)tests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n8g-async-1off.sh(1 hunks)tests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n8g.sh(1 hunks)tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g.sh(1 hunks)tests/test_suites/llm/performance/grpo-qwen3-32b-8n8g-async-1off.sh(1 hunks)tests/test_suites/performance.txt(1 hunks)tests/unit/test_recipes_and_test_suites.py(3 hunks)
🧰 Additional context used
📓 Path-based instructions (6)
tests/test_suites/**
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Place driver shell scripts and common.env under tests/test_suites// and list nightly tests in tests/test_suites/nightly.txt
Files:
tests/test_suites/performance.txttests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n8g-async-1off.shtests/test_suites/llm/performance/grpo-deepseek-v3-64n8g-async-1off.shtests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.shtests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g.shtests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n8g.shtests/test_suites/llm/performance/grpo-qwen3-235b-32n8g-async-1off.shtests/test_suites/llm/performance/grpo-qwen3-32b-8n8g-async-1off.shtests/test_suites/llm/performance/grpo-qwen3-32b-4n8g.shtests/test_suites/llm/performance/common.envtests/test_suites/llm/performance/grpo-deepseek-v3-32n8g.shtests/test_suites/llm/performance/grpo-qwen3-235b-16n8g.sh
examples/configs/recipes/**/*.yaml
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
examples/configs/recipes/**/*.yaml: Recipe YAMLs under examples/configs/recipes/** are runnable snapshots and may omit documentation
When adding support for a new model, add a recipe YAML under examples/configs/recipes/ in the appropriate domain (llm/ or vlm/) with the correct name
Files:
examples/configs/recipes/llm/performance/grpo-qwen3-30ba3b-4n8g.yamlexamples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.yamlexamples/configs/recipes/llm/performance/grpo-qwen3-235b-16n8g.yamlexamples/configs/recipes/llm/performance/grpo-deepseek-v3-64n8g-async-1off.yamlexamples/configs/recipes/llm/performance/grpo-qwen3-32b-8n8g-async-1off.yamlexamples/configs/recipes/llm/performance/grpo-qwen3-235b-32n8g-async-1off.yamlexamples/configs/recipes/llm/performance/grpo-qwen3-30ba3b-4n8g-async-1off.yamlexamples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g.yamlexamples/configs/recipes/llm/performance/grpo-deepseek-v3-32n8g.yamlexamples/configs/recipes/llm/performance/grpo-qwen3-32b-4n8g.yaml
examples/configs/recipes/**/*.{yaml,sh}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Known exception: Deepscaler recipes may encode context length in place of the cluster tuple (e.g., grpo-deepscaler-1.5b-8K.*); allowed but document intended hardware in the script
Files:
examples/configs/recipes/llm/performance/grpo-qwen3-30ba3b-4n8g.yamlexamples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.yamlexamples/configs/recipes/llm/performance/grpo-qwen3-235b-16n8g.yamlexamples/configs/recipes/llm/performance/grpo-deepseek-v3-64n8g-async-1off.yamlexamples/configs/recipes/llm/performance/grpo-qwen3-32b-8n8g-async-1off.yamlexamples/configs/recipes/llm/performance/grpo-qwen3-235b-32n8g-async-1off.yamlexamples/configs/recipes/llm/performance/grpo-qwen3-30ba3b-4n8g-async-1off.yamlexamples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g.yamlexamples/configs/recipes/llm/performance/grpo-deepseek-v3-32n8g.yamlexamples/configs/recipes/llm/performance/grpo-qwen3-32b-4n8g.yaml
examples/configs/recipes/**
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Place recipe YAMLs under examples/configs/recipes//
Files:
examples/configs/recipes/llm/performance/grpo-qwen3-30ba3b-4n8g.yamlexamples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.yamlexamples/configs/recipes/llm/performance/grpo-qwen3-235b-16n8g.yamlexamples/configs/recipes/llm/performance/grpo-deepseek-v3-64n8g-async-1off.yamlexamples/configs/recipes/llm/performance/grpo-qwen3-32b-8n8g-async-1off.yamlexamples/configs/recipes/llm/performance/grpo-qwen3-235b-32n8g-async-1off.yamlexamples/configs/recipes/llm/performance/grpo-qwen3-30ba3b-4n8g-async-1off.yamlexamples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g.yamlexamples/configs/recipes/llm/performance/grpo-deepseek-v3-32n8g.yamlexamples/configs/recipes/llm/performance/grpo-qwen3-32b-4n8g.yaml
**/*.sh
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.sh: Follow the Google Shell Style Guide for all shell scripts
Useuv runto execute Python scripts in shell/driver scripts instead of activating virtualenvs and callingpythondirectly
Add the NVIDIA copyright header (with current year) at the top of all shell scripts, excluding tests/ and test-only scripts
Files:
tests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n8g-async-1off.shtests/test_suites/llm/performance/grpo-deepseek-v3-64n8g-async-1off.shtests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.shtests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g.shtests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n8g.shtests/test_suites/llm/performance/grpo-qwen3-235b-32n8g-async-1off.shtests/test_suites/llm/performance/grpo-qwen3-32b-8n8g-async-1off.shtests/test_suites/llm/performance/grpo-qwen3-32b-4n8g.shtests/test_suites/llm/performance/grpo-deepseek-v3-32n8g.shtests/test_suites/llm/performance/grpo-qwen3-235b-16n8g.sh
**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.py: Follow the Google Python Style Guide for all Python code
Target Python 3.12+ for all Python code in NeMo-RL
Indent Python code with 4 spaces; do not use tabs
Python filenames should be snake_case (e.g., some_file.py)
Class names should be PascalCase
Function and method names should be snake_case
Local variable names should be snake_case; if starting with a number, prefix with k (e.g., k_99th_percentile)
Global variables should be UPPER_SNAKE_CASE and prefixed with G_ (e.g., G_MY_GLOBAL)
Constants should be UPPER_SNAKE_CASE
Avoid shadowing variables declared in an outer scope
Initialize all externally visible members of a class in the constructor
For public interfaces used outside a file, prefer docstrings over comments
Use comments mainly for code within a function or interfaces local to a file
Commented-out code must include a nearby comment explaining usage and why it is commented out; otherwise remove before merging
Use Google-style docstrings for classes and functions (Sphinx-parseable)
Avoid using reflection when functionality can be easily achieved without it
Limit except clauses to the smallest specific set of exceptions possible
For duck-typing via try/except, keep the try body minimal and use else for main logic
Add the NVIDIA copyright header (with current year) at the top of all Python files, excluding tests/ and test-only scripts
Files:
tests/unit/test_recipes_and_test_suites.py
🧠 Learnings (17)
📓 Common learnings
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: coderabbit-custom-pre-merge-checks-unique-id-file-non-traceable-F7F2B60C-1728-4C9A-8889-4F2235E186CA.txt:0-0
Timestamp: 2025-09-20T14:59:08.052Z
Learning: If a change could affect performance, include before-and-after performance numbers in the PR description, along with configuration and context.
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to tests/test_suites/nightly.txt : Append the new driver script path (relative to tests/test_suites/) to tests/test_suites/nightly.txt
Applied to files:
tests/test_suites/performance.txttests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n8g-async-1off.shtests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.shtests/unit/test_recipes_and_test_suites.py
📚 Learning: 2025-09-20T14:59:08.052Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: coderabbit-custom-pre-merge-checks-unique-id-file-non-traceable-F7F2B60C-1728-4C9A-8889-4F2235E186CA.txt:0-0
Timestamp: 2025-09-20T14:59:08.052Z
Learning: If a change could affect performance, include before-and-after performance numbers in the PR description, along with configuration and context.
Applied to files:
tests/test_suites/performance.txt
📚 Learning: 2025-10-12T14:46:57.171Z
Learnt from: zpqiu
Repo: NVIDIA-NeMo/RL PR: 1324
File: tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh:6-11
Timestamp: 2025-10-12T14:46:57.171Z
Learning: Test scripts in tests/test_suites/llm/ follow a standard configuration pattern that includes NUM_NODES, STEPS_PER_RUN, MAX_STEPS, NUM_RUNS (calculated as `$(( (MAX_STEPS + STEPS_PER_RUN - 1) / STEPS_PER_RUN ))`), and NUM_MINUTES. These variables are part of the test infrastructure's standard interface and should not be flagged as unused even if not directly referenced within the individual script, as they are consumed by external launch tooling or common.env.
Applied to files:
tests/test_suites/performance.txttests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n8g-async-1off.shtests/test_suites/llm/performance/grpo-deepseek-v3-64n8g-async-1off.shtests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.shtests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g.shtests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n8g.shtests/test_suites/llm/performance/grpo-qwen3-235b-32n8g-async-1off.shtests/test_suites/llm/performance/grpo-qwen3-32b-8n8g-async-1off.shtests/test_suites/llm/performance/grpo-qwen3-32b-4n8g.shtests/test_suites/llm/performance/common.envtests/test_suites/llm/performance/grpo-deepseek-v3-32n8g.shtests/test_suites/llm/performance/grpo-qwen3-235b-16n8g.sh
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to tests/test_suites/llm/*.sh : LLM driver script filenames must mirror the YAML base name and follow the same pattern with .sh extension
Applied to files:
tests/test_suites/performance.txttests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.shtests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g.shtests/test_suites/llm/performance/common.envtests/unit/test_recipes_and_test_suites.py
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to examples/configs/recipes/llm/*.yaml : LLM recipe YAML filenames must follow: <algo>-<model>-<nodes>n<gpus>g-<strategy-and-params>[-modifiers][-long][.vN].yaml
Applied to files:
examples/configs/recipes/llm/performance/grpo-qwen3-30ba3b-4n8g.yamlexamples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.yamlexamples/configs/recipes/llm/performance/grpo-deepseek-v3-64n8g-async-1off.yamlexamples/configs/recipes/llm/performance/grpo-qwen3-32b-8n8g-async-1off.yamlexamples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g.yamlexamples/configs/recipes/llm/performance/grpo-deepseek-v3-32n8g.yaml
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to tests/test_suites/**/*.{sh} : For new model support, add a matching driver shell script under tests/test_suites/<domain>/ that sources common.env and invokes 'uv run ... --config <yaml>'
Applied to files:
tests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n8g-async-1off.shtests/test_suites/llm/performance/grpo-deepseek-v3-64n8g-async-1off.shtests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.shtests/test_suites/llm/performance/grpo-qwen3-32b-4n8g.shtests/test_suites/llm/performance/common.env
📚 Learning: 2025-10-12T14:46:55.513Z
Learnt from: zpqiu
Repo: NVIDIA-NeMo/RL PR: 1324
File: tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh:16-30
Timestamp: 2025-10-12T14:46:55.513Z
Learning: In the NVIDIA-NeMo/RL repository, test scripts under tests/ follow a consistent pattern: use `cd $PROJECT_ROOT` without quotes or error handling, and pass arguments with `$@` unquoted. Maintain this consistency when adding new test scripts.
Applied to files:
tests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n8g-async-1off.shtests/test_suites/llm/performance/grpo-deepseek-v3-64n8g-async-1off.shtests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.shtests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g.shtests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n8g.shtests/test_suites/llm/performance/grpo-qwen3-235b-32n8g-async-1off.shtests/test_suites/llm/performance/grpo-qwen3-32b-8n8g-async-1off.shtests/test_suites/llm/performance/grpo-qwen3-32b-4n8g.shtests/test_suites/llm/performance/common.envtests/test_suites/llm/performance/grpo-deepseek-v3-32n8g.shtests/test_suites/llm/performance/grpo-qwen3-235b-16n8g.sh
📚 Learning: 2025-09-19T07:28:29.887Z
Learnt from: shuo-nvidia
Repo: NVIDIA-NeMo/RL PR: 1006
File: tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.sh:1-4
Timestamp: 2025-09-19T07:28:29.887Z
Learning: The NVIDIA-NeMo/RL project prefers to maintain consistent formatting across test scripts rather than applying individual bash hardening improvements like `set -euo pipefail` or proper quoting for sourcing files.
Applied to files:
tests/test_suites/llm/performance/grpo-deepseek-v3-64n8g-async-1off.shtests/test_suites/llm/performance/grpo-qwen3-235b-32n8g-async-1off.shtests/test_suites/llm/performance/common.envtests/test_suites/llm/performance/grpo-deepseek-v3-32n8g.shtests/test_suites/llm/performance/grpo-qwen3-235b-16n8g.sh
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to examples/configs/recipes/**/*.{yaml,sh} : Known exception: Deepscaler recipes may encode context length in place of the cluster tuple (e.g., grpo-deepscaler-1.5b-8K.*); allowed but document intended hardware in the script
Applied to files:
examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.yamlexamples/configs/recipes/llm/performance/grpo-deepseek-v3-64n8g-async-1off.yamlexamples/configs/recipes/llm/performance/grpo-qwen3-32b-8n8g-async-1off.yamlexamples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g.yamlexamples/configs/recipes/llm/performance/grpo-deepseek-v3-32n8g.yaml
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to examples/configs/recipes/vlm/*.yaml : VLM recipe YAML filenames must follow: vlm_<algo>-<model>-<nodes>n<gpus>g-<strategy>[-modifiers][.vN].yaml
Applied to files:
examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.yamlexamples/configs/recipes/llm/performance/grpo-qwen3-32b-8n8g-async-1off.yamlexamples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g.yaml
📚 Learning: 2025-09-18T14:57:31.003Z
Learnt from: zpqiu
Repo: NVIDIA-NeMo/RL PR: 1006
File: nemo_rl/algorithms/distillation.py:312-354
Timestamp: 2025-09-18T14:57:31.003Z
Learning: The distillation algorithm's cluster setup logic is designed to follow the same patterns used in GRPO for handling distributed training clusters and resource allocation.
Applied to files:
examples/configs/recipes/llm/performance/grpo-deepseek-v3-64n8g-async-1off.yamlexamples/configs/recipes/llm/performance/grpo-deepseek-v3-32n8g.yaml
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to examples/configs/recipes/**/*.yaml : When adding support for a new model, add a recipe YAML under examples/configs/recipes/ in the appropriate domain (llm/ or vlm/) with the correct name
Applied to files:
examples/configs/recipes/llm/performance/grpo-qwen3-32b-8n8g-async-1off.yamlexamples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g.yaml
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to examples/configs/*.yaml : When adding a new config key, reflect its recommended default in exemplar YAMLs under examples/configs/*.yaml
Applied to files:
examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g.yaml
📚 Learning: 2025-09-19T03:00:58.662Z
Learnt from: shuo-nvidia
Repo: NVIDIA-NeMo/RL PR: 1006
File: examples/configs/recipes/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-fsdp2tp1.v1.yaml:85-101
Timestamp: 2025-09-19T03:00:58.662Z
Learning: In distillation and GRPO configurations, max_new_tokens is intentionally set to the full context window (max_total_sequence_length) for consistency across the codebase. Overflow cases when prompt + generation tokens exceed max_model_len are handled by safeguards implemented in vllm_worker.py.
Applied to files:
examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g.yamlexamples/configs/recipes/llm/performance/grpo-deepseek-v3-32n8g.yaml
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to tests/test_suites/** : Place driver shell scripts and common.env under tests/test_suites/<domain>/ and list nightly tests in tests/test_suites/nightly.txt
Applied to files:
tests/test_suites/llm/performance/common.envtests/unit/test_recipes_and_test_suites.py
📚 Learning: 2025-09-20T14:58:45.492Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to tests/test_suites/vlm/*.sh : VLM driver script filenames must mirror the YAML base name and follow the same pattern with .sh extension
Applied to files:
tests/test_suites/llm/performance/common.env
🧬 Code graph analysis (10)
tests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n8g-async-1off.sh (1)
tests/test_suites/llm/performance/common.env (1)
exit_if_max_steps_reached(12-20)
tests/test_suites/llm/performance/grpo-deepseek-v3-64n8g-async-1off.sh (1)
tests/test_suites/llm/performance/common.env (1)
exit_if_max_steps_reached(12-20)
tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.sh (1)
tests/test_suites/llm/performance/common.env (1)
exit_if_max_steps_reached(12-20)
tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g.sh (1)
tests/test_suites/llm/performance/common.env (1)
exit_if_max_steps_reached(12-20)
tests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n8g.sh (1)
tests/test_suites/llm/performance/common.env (1)
exit_if_max_steps_reached(12-20)
tests/test_suites/llm/performance/grpo-qwen3-235b-32n8g-async-1off.sh (1)
tests/test_suites/llm/performance/common.env (1)
exit_if_max_steps_reached(12-20)
tests/test_suites/llm/performance/grpo-qwen3-32b-8n8g-async-1off.sh (1)
tests/test_suites/llm/performance/common.env (1)
exit_if_max_steps_reached(12-20)
tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g.sh (1)
tests/test_suites/llm/performance/common.env (1)
exit_if_max_steps_reached(12-20)
tests/test_suites/llm/performance/grpo-deepseek-v3-32n8g.sh (1)
tests/test_suites/llm/performance/common.env (1)
exit_if_max_steps_reached(12-20)
tests/test_suites/llm/performance/grpo-qwen3-235b-16n8g.sh (1)
tests/test_suites/llm/performance/common.env (1)
exit_if_max_steps_reached(12-20)
🪛 Shellcheck (0.11.0)
tests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n8g-async-1off.sh
[warning] 6-6: NUM_NODES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 9-9: NUM_RUNS appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 16-16: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
[error] 28-28: Double quote array expansions to avoid re-splitting elements.
(SC2068)
tests/test_suites/llm/performance/grpo-deepseek-v3-64n8g-async-1off.sh
[warning] 10-10: NUM_NODES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 13-13: NUM_RUNS appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 14-14: NUM_MINUTES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 20-20: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
[error] 34-34: Double quote array expansions to avoid re-splitting elements.
(SC2068)
tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.sh
[warning] 6-6: NUM_NODES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 9-9: NUM_RUNS appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 16-16: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
[error] 28-28: Double quote array expansions to avoid re-splitting elements.
(SC2068)
tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g.sh
[warning] 6-6: NUM_NODES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 9-9: NUM_RUNS appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 16-16: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
[error] 28-28: Double quote array expansions to avoid re-splitting elements.
(SC2068)
tests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n8g.sh
[warning] 6-6: NUM_NODES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 9-9: NUM_RUNS appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 16-16: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
[error] 28-28: Double quote array expansions to avoid re-splitting elements.
(SC2068)
tests/test_suites/llm/performance/grpo-qwen3-235b-32n8g-async-1off.sh
[warning] 8-8: NUM_NODES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 11-11: NUM_RUNS appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 12-12: NUM_MINUTES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 18-18: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
[error] 29-29: Double quote array expansions to avoid re-splitting elements.
(SC2068)
tests/test_suites/llm/performance/grpo-qwen3-32b-8n8g-async-1off.sh
[warning] 6-6: NUM_NODES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 9-9: NUM_RUNS appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 16-16: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
[error] 28-28: Double quote array expansions to avoid re-splitting elements.
(SC2068)
tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g.sh
[warning] 6-6: NUM_NODES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 9-9: NUM_RUNS appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 16-16: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
[error] 28-28: Double quote array expansions to avoid re-splitting elements.
(SC2068)
tests/test_suites/llm/performance/grpo-deepseek-v3-32n8g.sh
[warning] 10-10: NUM_NODES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 13-13: NUM_RUNS appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 14-14: NUM_MINUTES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 20-20: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
[error] 34-34: Double quote array expansions to avoid re-splitting elements.
(SC2068)
tests/test_suites/llm/performance/grpo-qwen3-235b-16n8g.sh
[warning] 8-8: NUM_NODES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 11-11: NUM_RUNS appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 12-12: NUM_MINUTES appears unused. Verify use (or export if used externally).
(SC2034)
[warning] 18-18: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
[error] 29-29: Double quote array expansions to avoid re-splitting elements.
(SC2068)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Lint check
- GitHub Check: Lint check
- GitHub Check: Lint check
- GitHub Check: Post submodule check comment / Comment on PR
- GitHub Check: Post automodel integration comment / Comment on PR
🔇 Additional comments (27)
examples/configs/recipes/llm/performance/grpo-qwen3-30ba3b-4n8g.yaml (1)
1-44: LGTM! Configuration follows established patterns and is consistent with other GRPO recipe files in this PR. Filename, inheritance structure, and all settings align with the GRPO infrastructure.examples/configs/recipes/llm/performance/grpo-qwen3-32b-8n8g-async-1off.yaml (1)
1-32: LGTM! Async variant correctly inherits from base config and properly enables async GRPO settings with importance sampling correction and async vLLM engine. Naming and configuration align with established patterns.examples/configs/recipes/llm/performance/grpo-qwen3-235b-16n8g.yaml (1)
1-59: LGTM! Configuration for large-scale Qwen3-235B model is well-structured with comprehensive distributed training settings (pipeline, context, and expert parallelism). The documented caveat about tensorboard is helpful context.examples/configs/recipes/llm/performance/grpo-qwen3-32b-4n8g.yaml (1)
1-41: LGTM! Base configuration for Qwen3-32B is properly structured. This serves as the parent config for async variants, with appropriate parallelism strategy for the 4-node cluster.examples/configs/recipes/llm/performance/grpo-qwen3-30ba3b-4n8g-async-1off.yaml (1)
1-33: LGTM! Async variant properly enables async GRPO with trajectory management and importance sampling. Parallelism parameters are appropriately adjusted for the async configuration.examples/configs/recipes/llm/performance/grpo-deepseek-v3-64n8g-async-1off.yaml (1)
1-33: LGTM! Large-scale async configuration for DeepSeek v3 is well-structured. Parallelism strategy (pure pipeline+expert with 32-way tensor parallelism for generation) is appropriate for the 64-node cluster.tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g.sh (2)
28-28: Consider quoting$@to avoid word-splitting (SC2068).Line 28 passes
$@unquoted, which shellcheck flags as an error: "Double quote array expansions to avoid re-splitting elements." While recent codebase guidance (zpqiu, PR 1324) indicates using$@unquoted for consistency with existing test scripts, this is a potential source of subtle bugs if arguments contain spaces or special characters. For robustness, consider using"$@"instead.Alternatively, if unquoted
$@is an established pattern in this codebase worth preserving, you can acknowledge this as an intentional trade-off for consistency.
1-39: LGTM! Script follows established test infrastructure patterns: standard config variables,uv runusage, environment variable referencing, and conditional metrics logic are all correct. One minor shellcheck concern noted separately above.examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g.yaml (1)
1-54: LGTM! Llama 3.1 8B Instruct configuration is well-structured for a 2-node cluster. Themax_new_tokens=4096setting correctly matchesmax_total_sequence_lengthper established patterns, with overflow handling delegated to vLLM.tests/test_suites/performance.txt (1)
1-15: LGTM!The performance test suite listing is well-organized, grouping standard and async variants separately. All listed paths correspond to test scripts included in this PR.
examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.yaml (1)
22-24: Verify log directory and experiment name match cluster configuration.The log directory and WandB experiment name reference "1T1G" (suggesting 1 tensor parallel, 1 pipeline parallel) but the cluster configuration specifies 2 nodes with 8 GPUs per node. This naming inconsistency could cause confusion when analyzing results.
Consider aligning the naming with the cluster configuration or documenting the parallelism strategy if "1T1G" refers to a specific tensor/pipeline parallelism configuration.
tests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n8g-async-1off.sh (1)
1-39: LGTM!The script follows the established test infrastructure patterns correctly, including standard configuration variables, early-exit guards, experiment execution via
uv run, and conditional metrics validation.tests/test_suites/llm/performance/grpo-deepseek-v3-64n8g-async-1off.sh (1)
1-45: LGTM!The script properly handles DeepSeek-specific requirements (NVLS disablement for OOM avoidance, configurable HF checkpoint path) while following the standard test infrastructure patterns.
tests/test_suites/llm/performance/common.env (1)
1-45: LGTM!The common environment script provides robust infrastructure for performance tests, including proper error handling (
set -eou pipefail), path validation, early-exit optimization, and dry-run support.tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g.sh (1)
1-39: LGTM!The script follows the established test infrastructure patterns correctly for the LLaMA 3.1 8B instruct model performance testing.
tests/test_suites/llm/performance/grpo-qwen3-235b-32n8g-async-1off.sh (1)
1-40: LGTM!The script follows the established test infrastructure patterns correctly for the Qwen3 235B model, including NVLS disablement to avoid OOM issues on the large-scale 32-node deployment.
tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.sh (1)
1-39: LGTM! Script follows the established test infrastructure pattern.The script correctly implements the standard GRPO performance test flow: environment setup, early-exit guard, experiment execution with proper logging and checkpointing, TensorBoard log conversion, and conditional metrics validation.
tests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n8g.sh (2)
1-39: Script implementation follows the standard pattern correctly.Assuming the filename is verified to be correct, the script logic properly implements the GRPO performance test flow with appropriate configuration, logging, and metrics validation.
1-1: The shell script filename is correct and not a typo. The verification shows thatgrpo-qwen3-30ba3b-4n8g.shhas a corresponding YAML config atgrpo-qwen3-30ba3b-4n8g.yaml, and the naming is consistent across both files. The "30ba3b" is an intentional model variant identifier, not a naming error. The shell script filename properly mirrors the YAML base name as required by the coding guidelines.Likely an incorrect or invalid review comment.
tests/test_suites/llm/performance/grpo-deepseek-v3-32n8g.sh (1)
1-45: LGTM! DeepSeek-specific configuration is appropriately handled.The script correctly implements DeepSeek v3 requirements including NVLS disablement for OOM avoidance, configurable HF checkpoint path, and proper model/tokenizer overrides. The higher resource allocation (32 nodes, 240 minutes) aligns with the larger model scale.
tests/test_suites/llm/performance/grpo-qwen3-235b-16n8g.sh (1)
1-6: LGTM! Environment setup is appropriate for large model testing.The NVLS disablement is properly documented and necessary to avoid OOM issues with the 235B model.
tests/unit/test_recipes_and_test_suites.py (4)
31-31: LGTM! Performance test suite path properly defined.The constant follows the established naming pattern for test suite paths.
72-80: LGTM! Performance test suite fixture correctly implemented.The fixture follows the same pattern as
nightly_test_suiteandrelease_test_suite, ensuring consistency across the test infrastructure.
83-89: LGTM! Performance suite properly integrated into test aggregation.The
all_test_suitesfixture correctly includes the newperformance_test_suiteparameter and concatenates it with nightly and release suites.
102-113: LGTM! Test parametrization updated to include performance suite.The
test_test_suites_existtest now validates all three test suites (nightly, release, and performance) with appropriate test IDs.examples/configs/recipes/llm/performance/grpo-deepseek-v3-32n8g.yaml (1)
1-57: LGTM! Configuration is comprehensive and follows established patterns.The YAML properly configures:
- GRPO training parameters with appropriate batch sizes for the scale
- Megatron parallelism settings for DeepSeek v3 (EP16, PP16)
- vLLM generation with async engine (TP32)
- Proper cluster resource allocation (32 nodes × 8 GPUs)
The configuration aligns with the corresponding test script and follows the project's recipe naming conventions.
tests/test_suites/llm/performance/grpo-qwen3-32b-8n8g-async-1off.sh (1)
1-39: LGTM! Script correctly implements the GRPO performance test pattern.The script properly configures and executes the GRPO Qwen3-32B experiment with appropriate logging, checkpointing, and metrics validation.
| log_dir: logs/grpo-qwen3-235b-32n8g-16T16G-async-1off | ||
| wandb: | ||
| name: grpo-qwen3-235b-32n8g-16T16G-async-1off |
There was a problem hiding this comment.
Fix inconsistent log directory and experiment name.
The log directory and WandB experiment name reference "16T16G" but this configuration is for a 32-node, 8-GPU-per-node cluster (32n8g). This appears to be a copy-paste error from the 16n8g variant.
Apply this diff:
logger:
- log_dir: logs/grpo-qwen3-235b-32n8g-16T16G-async-1off
+ log_dir: logs/grpo-qwen3-235b-32n8g-async-1off
wandb:
- name: grpo-qwen3-235b-32n8g-16T16G-async-1off
+ name: grpo-qwen3-235b-32n8g-async-1off📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| log_dir: logs/grpo-qwen3-235b-32n8g-16T16G-async-1off | |
| wandb: | |
| name: grpo-qwen3-235b-32n8g-16T16G-async-1off | |
| log_dir: logs/grpo-qwen3-235b-32n8g-async-1off | |
| wandb: | |
| name: grpo-qwen3-235b-32n8g-async-1off |
🤖 Prompt for AI Agents
In
examples/configs/recipes/llm/performance/grpo-qwen3-235b-32n8g-async-1off.yaml
around lines 30-32, the log_dir and wandb.name still contain "16T16G" from the
16n8g copy; update both entries so they reference the correct 32n8g
configuration (e.g., replace "16T16G" with "32T8G" so both log_dir and
wandb.name become consistent with the 32n8g-async-1off run).
| cd $PROJECT_ROOT | ||
| uv run examples/run_grpo_math.py \ | ||
| --config $CONFIG_PATH \ | ||
| grpo.max_num_steps=$MAX_STEPS \ | ||
| logger.log_dir=$LOG_DIR \ | ||
| logger.wandb_enabled=True \ | ||
| logger.wandb.project=nemo-rl \ | ||
| logger.wandb.name=$EXP_NAME \ | ||
| logger.monitor_gpus=True \ | ||
| checkpointing.enabled=True \ | ||
| checkpointing.checkpoint_dir=$CKPT_DIR \ | ||
| $@ \ | ||
| 2>&1 | tee $RUN_LOG |
There was a problem hiding this comment.
Add missing tensorboard_enabled configuration.
The script converts TensorBoard logs at line 33 but doesn't enable TensorBoard logging in the experiment configuration. This is inconsistent with other performance test scripts and may cause the log conversion to fail or produce empty results.
Apply this diff to enable TensorBoard logging:
logger.monitor_gpus=True \
+ logger.tensorboard_enabled=True \
checkpointing.enabled=True \📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| cd $PROJECT_ROOT | |
| uv run examples/run_grpo_math.py \ | |
| --config $CONFIG_PATH \ | |
| grpo.max_num_steps=$MAX_STEPS \ | |
| logger.log_dir=$LOG_DIR \ | |
| logger.wandb_enabled=True \ | |
| logger.wandb.project=nemo-rl \ | |
| logger.wandb.name=$EXP_NAME \ | |
| logger.monitor_gpus=True \ | |
| checkpointing.enabled=True \ | |
| checkpointing.checkpoint_dir=$CKPT_DIR \ | |
| $@ \ | |
| 2>&1 | tee $RUN_LOG | |
| cd $PROJECT_ROOT | |
| uv run examples/run_grpo_math.py \ | |
| --config $CONFIG_PATH \ | |
| grpo.max_num_steps=$MAX_STEPS \ | |
| logger.log_dir=$LOG_DIR \ | |
| logger.wandb_enabled=True \ | |
| logger.wandb.project=nemo-rl \ | |
| logger.wandb.name=$EXP_NAME \ | |
| logger.monitor_gpus=True \ | |
| logger.tensorboard_enabled=True \ | |
| checkpointing.enabled=True \ | |
| checkpointing.checkpoint_dir=$CKPT_DIR \ | |
| $@ \ | |
| 2>&1 | tee $RUN_LOG |
🧰 Tools
🪛 Shellcheck (0.11.0)
[warning] 18-18: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.
(SC2164)
[error] 29-29: Double quote array expansions to avoid re-splitting elements.
(SC2068)
🤖 Prompt for AI Agents
In tests/test_suites/llm/performance/grpo-qwen3-235b-16n8g.sh around lines 18 to
30, the experiment CLI call enables WandB but does not enable TensorBoard, yet
the script later converts TensorBoard logs; add the missing
logger.tensorboard_enabled=True flag to the uv run arguments (alongside the
existing logger.wandb.* and logger.monitor_gpus entries) so TensorBoard logging
is actually enabled for the run.
beep boop [🤖]: Hi @guyueh1 👋,
Summary by CodeRabbit
New Features
Tests