cp: `feat: Onboard perf recipes in tests (1322)` into `r0.4.0` by chtruong814 · Pull Request #1497 · NVIDIA-NeMo/RL

chtruong814 · 2025-11-10T20:18:16Z

beep boop [🤖]: Hi @guyueh1 👋,

we've cherry picked #1322 into  for you! 🚀

Please review and approve this cherry pick by your convenience!

Summary by CodeRabbit

New Features
- Added experiment configurations for model training supporting multiple model architectures and execution modes, including asynchronous training variants.
Tests
- Introduced performance test suite with automated test scripts, log processing utilities, and metrics validation for training experiments.
- Updated test suite registry to support new performance testing infrastructure.

Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com> Signed-off-by: Zhiyu Li <zhiyul@nvidia.com> Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com> Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: Anna Shors <ashors@nvidia.com> Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com> Signed-off-by: Parth Chadha <pchadha@nvidia.com> Signed-off-by: Guyue Huang <140554423+guyueh1@users.noreply.github.com> Co-authored-by: Youngeun Kwon <youngeunk@nvidia.com> Co-authored-by: Zhiyu Li <zhiyul@NVIDIA.com> Co-authored-by: Terry Kong <terrycurtiskong@gmail.com> Co-authored-by: Anna Shors <ashors@nvidia.com> Co-authored-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com> Co-authored-by: Parth Chadha <pchadha@nvidia.com> Co-authored-by: Terry Kong <terryk@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>

coderabbitai · 2025-11-10T20:24:35Z

📝 Walkthrough

Walkthrough

This PR adds GRPO performance configurations and test suites for multiple LLM variants including DeepSeek v3, LLaMA 3.1, and Qwen3 models, featuring both standard and asynchronous training setups. Includes configuration files specifying training hyperparameters, model parallelism, hardware allocation, and logging, along with bash test scripts for performance validation and test infrastructure integration.

Changes

Cohort / File(s)	Summary
GRPO Configuration Files – DeepSeek v3 `examples/configs/recipes/llm/performance/grpo-deepseek-v3-32n8g.yaml`, `grpo-deepseek-v3-64n8g-async-1off.yaml`	Added comprehensive GRPO training configs for DeepSeek v3 with Megatron distributed settings, vLLM generation, importance sampling correction, checkpointing, and WandB logging. Async variant enables async_grpo with in-flight weight updates and specifies 64-node cluster.
GRPO Configuration Files – LLaMA 3.1 `examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g.yaml`, `grpo-llama3.1-8b-instruct-2n8g-async-1off.yaml`	Added GRPO configs for LLaMA 3.1 8B Instruct with Megatron settings, async variant enabled with max_trajectory_age_steps=1, async_engine, and 2-node cluster allocation.
GRPO Configuration Files – Qwen3 (235B and 32B) `examples/configs/recipes/llm/performance/grpo-qwen3-235b-16n8g.yaml`, `grpo-qwen3-235b-32n8g-async-1off.yaml`, `grpo-qwen3-32b-4n8g.yaml`, `grpo-qwen3-32b-8n8g-async-1off.yaml`	Added GRPO configs for Qwen3 235B and 32B models with Megatron parallelism, importance sampling correction, and corresponding async-1off variants with asynchronous GRPO settings.
GRPO Configuration Files – Qwen3 (30B A3B) `examples/configs/recipes/llm/performance/grpo-qwen3-30ba3b-4n8g.yaml`, `grpo-qwen3-30ba3b-4n8g-async-1off.yaml`	Added GRPO configs for Qwen3 30B A3B variant with standard and async-1off setups, specifying tensor/model/expert parallelism and sequence parallelism.
Performance Test Infrastructure `tests/test_suites/llm/performance/common.env`	Added shared bash environment setup for LLM performance tests, defining functions for early exit on max steps, computing config paths, and setting up experiment directories.
GRPO Performance Test Scripts – DeepSeek v3 `tests/test_suites/llm/performance/grpo-deepseek-v3-32n8g.sh`, `grpo-deepseek-v3-64n8g-async-1off.sh`	Added test harnesses executing GRPO training via \`uv run\\` with TensorBoard-to-JSON conversion and conditional metrics validation on token probability error thresholds.
GRPO Performance Test Scripts – LLaMA 3.1 `tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g.sh`, `grpo-llama3.1-8b-instruct-2n8g-async-1off.sh`	Added test harnesses for LLaMA 3.1 with experiment configuration, logging, checkpointing, and metrics validation based on train/loss thresholds.
GRPO Performance Test Scripts – Qwen3 `tests/test_suites/llm/performance/grpo-qwen3-235b-16n8g.sh`, `grpo-qwen3-235b-32n8g-async-1off.sh`, `grpo-qwen3-30ba3b-4n8g.sh`, `grpo-qwen3-30ba3b-4n8g-async-1off.sh`, `grpo-qwen3-32b-4n8g.sh`, `grpo-qwen3-32b-8n8g-async-1off.sh`	Added test scripts for Qwen3 model variants with identical structure: environment setup, experiment execution, log conversion, and conditional metrics checks.
Test Suite Registry `tests/test_suites/performance.txt`	Added listing of 15 new GRPO performance test script paths under GRPO section.
Test Orchestration `tests/unit/test_recipes_and_test_suites.py`	Integrated performance test suite into test discovery by adding `performance_test_suite()` fixture, `performance_test_suite_path` constant, and updating `all_test_suites` fixture and `test_test_suites_exist` parametrization.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Breadth of changes: 23 new files (10 configs, 10 test scripts, 1 common.env, 1 registry, 1 Python update) across multiple model variants and async variants.
Homogeneity: High — test scripts and config files follow consistent, repetitive patterns with model-specific parameter variations, reducing per-file review complexity.
Key areas requiring attention:
- Verify parameter consistency across standard and async-1off config pairs (e.g., max_trajectory_age_steps=1, in_flight_weight_updates=true).
- Validate Megatron parallelism settings (tensor/pipeline/expert sizes) are correctly proportioned for node/GPU counts in each config.
- Ensure metric thresholds in test scripts (e.g., train/token_mult_prob_error < 1.1) are appropriate for each model size.
- Confirm common.env path resolution and CONFIG_PATH mapping logic correctly maps test script locations to config paths.

Possibly related PRs

NVIDIA-NeMo/RL#1098: Introduces async_grpo training infrastructure and async utilities that are directly consumed by the async-1off variants in this PR.
NVIDIA-NeMo/RL#1322: Overlaps with nearly identical LLM performance recipe configs, test scripts, and harness files as this PR.

Suggested labels

r0.4.0, CI:L0

Suggested reviewers

guyueh1
terrykong

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Test Results For Major Changes	⚠️ Warning	PR adds 10+ YAML configs and 11+ test scripts for GRPO performance recipes but lacks test results, validation data, or execution confirmation in PR description.	Document test execution results, baseline metrics (mean(train/token_mult_prob_error) values), verification that configs match deployment specs, and address review comments about log naming and tensorboard settings.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly identifies the main change: cherry-picking feature #1322 that onboards performance recipes in tests into the r0.4.0 release branch.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch cherry-pick-1322-r0.4.0

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 399d22f and 4676af3.

📒 Files selected for processing (23)

examples/configs/recipes/llm/performance/grpo-deepseek-v3-32n8g.yaml (1 hunks)
examples/configs/recipes/llm/performance/grpo-deepseek-v3-64n8g-async-1off.yaml (1 hunks)
examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.yaml (1 hunks)
examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g.yaml (1 hunks)
examples/configs/recipes/llm/performance/grpo-qwen3-235b-16n8g.yaml (1 hunks)
examples/configs/recipes/llm/performance/grpo-qwen3-235b-32n8g-async-1off.yaml (1 hunks)
examples/configs/recipes/llm/performance/grpo-qwen3-30ba3b-4n8g-async-1off.yaml (1 hunks)
examples/configs/recipes/llm/performance/grpo-qwen3-30ba3b-4n8g.yaml (1 hunks)
examples/configs/recipes/llm/performance/grpo-qwen3-32b-4n8g.yaml (1 hunks)
examples/configs/recipes/llm/performance/grpo-qwen3-32b-8n8g-async-1off.yaml (1 hunks)
tests/test_suites/llm/performance/common.env (1 hunks)
tests/test_suites/llm/performance/grpo-deepseek-v3-32n8g.sh (1 hunks)
tests/test_suites/llm/performance/grpo-deepseek-v3-64n8g-async-1off.sh (1 hunks)
tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.sh (1 hunks)
tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g.sh (1 hunks)
tests/test_suites/llm/performance/grpo-qwen3-235b-16n8g.sh (1 hunks)
tests/test_suites/llm/performance/grpo-qwen3-235b-32n8g-async-1off.sh (1 hunks)
tests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n8g-async-1off.sh (1 hunks)
tests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n8g.sh (1 hunks)
tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g.sh (1 hunks)
tests/test_suites/llm/performance/grpo-qwen3-32b-8n8g-async-1off.sh (1 hunks)
tests/test_suites/performance.txt (1 hunks)
tests/unit/test_recipes_and_test_suites.py (3 hunks)

🧰 Additional context used

📓 Path-based instructions (6)

tests/test_suites/**

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Files:

tests/test_suites/performance.txt
tests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n8g-async-1off.sh
tests/test_suites/llm/performance/grpo-deepseek-v3-64n8g-async-1off.sh
tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.sh
tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g.sh
tests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n8g.sh
tests/test_suites/llm/performance/grpo-qwen3-235b-32n8g-async-1off.sh
tests/test_suites/llm/performance/grpo-qwen3-32b-8n8g-async-1off.sh
tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g.sh
tests/test_suites/llm/performance/common.env
tests/test_suites/llm/performance/grpo-deepseek-v3-32n8g.sh
tests/test_suites/llm/performance/grpo-qwen3-235b-16n8g.sh

examples/configs/recipes/**/*.yaml

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Files:

examples/configs/recipes/llm/performance/grpo-qwen3-30ba3b-4n8g.yaml
examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.yaml
examples/configs/recipes/llm/performance/grpo-qwen3-235b-16n8g.yaml
examples/configs/recipes/llm/performance/grpo-deepseek-v3-64n8g-async-1off.yaml
examples/configs/recipes/llm/performance/grpo-qwen3-32b-8n8g-async-1off.yaml
examples/configs/recipes/llm/performance/grpo-qwen3-235b-32n8g-async-1off.yaml
examples/configs/recipes/llm/performance/grpo-qwen3-30ba3b-4n8g-async-1off.yaml
examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g.yaml
examples/configs/recipes/llm/performance/grpo-deepseek-v3-32n8g.yaml
examples/configs/recipes/llm/performance/grpo-qwen3-32b-4n8g.yaml

examples/configs/recipes/**/*.{yaml,sh}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Files:

examples/configs/recipes/llm/performance/grpo-qwen3-30ba3b-4n8g.yaml
examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.yaml
examples/configs/recipes/llm/performance/grpo-qwen3-235b-16n8g.yaml
examples/configs/recipes/llm/performance/grpo-deepseek-v3-64n8g-async-1off.yaml
examples/configs/recipes/llm/performance/grpo-qwen3-32b-8n8g-async-1off.yaml
examples/configs/recipes/llm/performance/grpo-qwen3-235b-32n8g-async-1off.yaml
examples/configs/recipes/llm/performance/grpo-qwen3-30ba3b-4n8g-async-1off.yaml
examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g.yaml
examples/configs/recipes/llm/performance/grpo-deepseek-v3-32n8g.yaml
examples/configs/recipes/llm/performance/grpo-qwen3-32b-4n8g.yaml

examples/configs/recipes/**

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Files:

examples/configs/recipes/llm/performance/grpo-qwen3-30ba3b-4n8g.yaml
examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.yaml
examples/configs/recipes/llm/performance/grpo-qwen3-235b-16n8g.yaml
examples/configs/recipes/llm/performance/grpo-deepseek-v3-64n8g-async-1off.yaml
examples/configs/recipes/llm/performance/grpo-qwen3-32b-8n8g-async-1off.yaml
examples/configs/recipes/llm/performance/grpo-qwen3-235b-32n8g-async-1off.yaml
examples/configs/recipes/llm/performance/grpo-qwen3-30ba3b-4n8g-async-1off.yaml
examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g.yaml
examples/configs/recipes/llm/performance/grpo-deepseek-v3-32n8g.yaml
examples/configs/recipes/llm/performance/grpo-qwen3-32b-4n8g.yaml

**/*.sh

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Files:

tests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n8g-async-1off.sh
tests/test_suites/llm/performance/grpo-deepseek-v3-64n8g-async-1off.sh
tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.sh
tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g.sh
tests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n8g.sh
tests/test_suites/llm/performance/grpo-qwen3-235b-32n8g-async-1off.sh
tests/test_suites/llm/performance/grpo-qwen3-32b-8n8g-async-1off.sh
tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g.sh
tests/test_suites/llm/performance/grpo-deepseek-v3-32n8g.sh
tests/test_suites/llm/performance/grpo-qwen3-235b-16n8g.sh

**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Files:

tests/unit/test_recipes_and_test_suites.py

🧠 Learnings (17)

📓 Common learnings

Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: coderabbit-custom-pre-merge-checks-unique-id-file-non-traceable-F7F2B60C-1728-4C9A-8889-4F2235E186CA.txt:0-0
Timestamp: 2025-09-20T14:59:08.052Z
Learning: If a change could affect performance, include before-and-after performance numbers in the PR description, along with configuration and context.

📚 Learning: 2025-09-20T14:58:45.492Z

Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to tests/test_suites/nightly.txt : Append the new driver script path (relative to tests/test_suites/) to tests/test_suites/nightly.txt

Applied to files:

tests/test_suites/performance.txt
tests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n8g-async-1off.sh
tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.sh
tests/unit/test_recipes_and_test_suites.py

📚 Learning: 2025-09-20T14:59:08.052Z

Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: coderabbit-custom-pre-merge-checks-unique-id-file-non-traceable-F7F2B60C-1728-4C9A-8889-4F2235E186CA.txt:0-0
Timestamp: 2025-09-20T14:59:08.052Z
Learning: If a change could affect performance, include before-and-after performance numbers in the PR description, along with configuration and context.

Applied to files:

tests/test_suites/performance.txt

📚 Learning: 2025-10-12T14:46:57.171Z

Learnt from: zpqiu
Repo: NVIDIA-NeMo/RL PR: 1324
File: tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh:6-11
Timestamp: 2025-10-12T14:46:57.171Z
Learning: Test scripts in tests/test_suites/llm/ follow a standard configuration pattern that includes NUM_NODES, STEPS_PER_RUN, MAX_STEPS, NUM_RUNS (calculated as `$(( (MAX_STEPS + STEPS_PER_RUN - 1) / STEPS_PER_RUN ))`), and NUM_MINUTES. These variables are part of the test infrastructure's standard interface and should not be flagged as unused even if not directly referenced within the individual script, as they are consumed by external launch tooling or common.env.

Applied to files:

tests/test_suites/performance.txt
tests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n8g-async-1off.sh
tests/test_suites/llm/performance/grpo-deepseek-v3-64n8g-async-1off.sh
tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.sh
tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g.sh
tests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n8g.sh
tests/test_suites/llm/performance/grpo-qwen3-235b-32n8g-async-1off.sh
tests/test_suites/llm/performance/grpo-qwen3-32b-8n8g-async-1off.sh
tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g.sh
tests/test_suites/llm/performance/common.env
tests/test_suites/llm/performance/grpo-deepseek-v3-32n8g.sh
tests/test_suites/llm/performance/grpo-qwen3-235b-16n8g.sh

📚 Learning: 2025-09-20T14:58:45.492Z

Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to tests/test_suites/llm/*.sh : LLM driver script filenames must mirror the YAML base name and follow the same pattern with .sh extension

Applied to files:

tests/test_suites/performance.txt
tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.sh
tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g.sh
tests/test_suites/llm/performance/common.env
tests/unit/test_recipes_and_test_suites.py

📚 Learning: 2025-09-20T14:58:45.492Z

Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to examples/configs/recipes/llm/*.yaml : LLM recipe YAML filenames must follow: <algo>-<model>-<nodes>n<gpus>g-<strategy-and-params>[-modifiers][-long][.vN].yaml

Applied to files:

examples/configs/recipes/llm/performance/grpo-qwen3-30ba3b-4n8g.yaml
examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.yaml
examples/configs/recipes/llm/performance/grpo-deepseek-v3-64n8g-async-1off.yaml
examples/configs/recipes/llm/performance/grpo-qwen3-32b-8n8g-async-1off.yaml
examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g.yaml
examples/configs/recipes/llm/performance/grpo-deepseek-v3-32n8g.yaml

📚 Learning: 2025-09-20T14:58:45.492Z

Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to tests/test_suites/**/*.{sh} : For new model support, add a matching driver shell script under tests/test_suites/<domain>/ that sources common.env and invokes 'uv run ... --config <yaml>'

Applied to files:

tests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n8g-async-1off.sh
tests/test_suites/llm/performance/grpo-deepseek-v3-64n8g-async-1off.sh
tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.sh
tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g.sh
tests/test_suites/llm/performance/common.env

📚 Learning: 2025-10-12T14:46:55.513Z

Learnt from: zpqiu
Repo: NVIDIA-NeMo/RL PR: 1324
File: tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh:16-30
Timestamp: 2025-10-12T14:46:55.513Z
Learning: In the NVIDIA-NeMo/RL repository, test scripts under tests/ follow a consistent pattern: use `cd $PROJECT_ROOT` without quotes or error handling, and pass arguments with `$@` unquoted. Maintain this consistency when adding new test scripts.

Applied to files:

tests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n8g-async-1off.sh
tests/test_suites/llm/performance/grpo-deepseek-v3-64n8g-async-1off.sh
tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.sh
tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g.sh
tests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n8g.sh
tests/test_suites/llm/performance/grpo-qwen3-235b-32n8g-async-1off.sh
tests/test_suites/llm/performance/grpo-qwen3-32b-8n8g-async-1off.sh
tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g.sh
tests/test_suites/llm/performance/common.env
tests/test_suites/llm/performance/grpo-deepseek-v3-32n8g.sh
tests/test_suites/llm/performance/grpo-qwen3-235b-16n8g.sh

📚 Learning: 2025-09-19T07:28:29.887Z

Learnt from: shuo-nvidia
Repo: NVIDIA-NeMo/RL PR: 1006
File: tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.sh:1-4
Timestamp: 2025-09-19T07:28:29.887Z
Learning: The NVIDIA-NeMo/RL project prefers to maintain consistent formatting across test scripts rather than applying individual bash hardening improvements like `set -euo pipefail` or proper quoting for sourcing files.

Applied to files:

tests/test_suites/llm/performance/grpo-deepseek-v3-64n8g-async-1off.sh
tests/test_suites/llm/performance/grpo-qwen3-235b-32n8g-async-1off.sh
tests/test_suites/llm/performance/common.env
tests/test_suites/llm/performance/grpo-deepseek-v3-32n8g.sh
tests/test_suites/llm/performance/grpo-qwen3-235b-16n8g.sh

📚 Learning: 2025-09-20T14:58:45.492Z

Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to examples/configs/recipes/**/*.{yaml,sh} : Known exception: Deepscaler recipes may encode context length in place of the cluster tuple (e.g., grpo-deepscaler-1.5b-8K.*); allowed but document intended hardware in the script

Applied to files:

examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.yaml
examples/configs/recipes/llm/performance/grpo-deepseek-v3-64n8g-async-1off.yaml
examples/configs/recipes/llm/performance/grpo-qwen3-32b-8n8g-async-1off.yaml
examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g.yaml
examples/configs/recipes/llm/performance/grpo-deepseek-v3-32n8g.yaml

📚 Learning: 2025-09-20T14:58:45.492Z

Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to examples/configs/recipes/vlm/*.yaml : VLM recipe YAML filenames must follow: vlm_<algo>-<model>-<nodes>n<gpus>g-<strategy>[-modifiers][.vN].yaml

Applied to files:

examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.yaml
examples/configs/recipes/llm/performance/grpo-qwen3-32b-8n8g-async-1off.yaml
examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g.yaml

📚 Learning: 2025-09-18T14:57:31.003Z

Learnt from: zpqiu
Repo: NVIDIA-NeMo/RL PR: 1006
File: nemo_rl/algorithms/distillation.py:312-354
Timestamp: 2025-09-18T14:57:31.003Z
Learning: The distillation algorithm's cluster setup logic is designed to follow the same patterns used in GRPO for handling distributed training clusters and resource allocation.

Applied to files:

examples/configs/recipes/llm/performance/grpo-deepseek-v3-64n8g-async-1off.yaml
examples/configs/recipes/llm/performance/grpo-deepseek-v3-32n8g.yaml

📚 Learning: 2025-09-20T14:58:45.492Z

Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to examples/configs/recipes/**/*.yaml : When adding support for a new model, add a recipe YAML under examples/configs/recipes/ in the appropriate domain (llm/ or vlm/) with the correct name

Applied to files:

examples/configs/recipes/llm/performance/grpo-qwen3-32b-8n8g-async-1off.yaml
examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g.yaml

📚 Learning: 2025-09-20T14:58:45.492Z

Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to examples/configs/*.yaml : When adding a new config key, reflect its recommended default in exemplar YAMLs under examples/configs/*.yaml

Applied to files:

examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g.yaml

📚 Learning: 2025-09-19T03:00:58.662Z

Learnt from: shuo-nvidia
Repo: NVIDIA-NeMo/RL PR: 1006
File: examples/configs/recipes/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-fsdp2tp1.v1.yaml:85-101
Timestamp: 2025-09-19T03:00:58.662Z
Learning: In distillation and GRPO configurations, max_new_tokens is intentionally set to the full context window (max_total_sequence_length) for consistency across the codebase. Overflow cases when prompt + generation tokens exceed max_model_len are handled by safeguards implemented in vllm_worker.py.

Applied to files:

examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g.yaml
examples/configs/recipes/llm/performance/grpo-deepseek-v3-32n8g.yaml

📚 Learning: 2025-09-20T14:58:45.492Z

Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to tests/test_suites/** : Place driver shell scripts and common.env under tests/test_suites/<domain>/ and list nightly tests in tests/test_suites/nightly.txt

Applied to files:

tests/test_suites/llm/performance/common.env
tests/unit/test_recipes_and_test_suites.py

📚 Learning: 2025-09-20T14:58:45.492Z

Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-09-20T14:58:45.492Z
Learning: Applies to tests/test_suites/vlm/*.sh : VLM driver script filenames must mirror the YAML base name and follow the same pattern with .sh extension

Applied to files:

tests/test_suites/llm/performance/common.env

🧬 Code graph analysis (10)

tests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n8g-async-1off.sh (1)

tests/test_suites/llm/performance/grpo-deepseek-v3-64n8g-async-1off.sh (1)

tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.sh (1)

tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g.sh (1)

tests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n8g.sh (1)

tests/test_suites/llm/performance/grpo-qwen3-235b-32n8g-async-1off.sh (1)

tests/test_suites/llm/performance/grpo-qwen3-32b-8n8g-async-1off.sh (1)

tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g.sh (1)

tests/test_suites/llm/performance/grpo-deepseek-v3-32n8g.sh (1)

tests/test_suites/llm/performance/grpo-qwen3-235b-16n8g.sh (1)

🪛 Shellcheck (0.11.0)

tests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n8g-async-1off.sh

[warning] 6-6: NUM_NODES appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 9-9: NUM_RUNS appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 16-16: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

[error] 28-28: Double quote array expansions to avoid re-splitting elements.

(SC2068)

tests/test_suites/llm/performance/grpo-deepseek-v3-64n8g-async-1off.sh

[warning] 10-10: NUM_NODES appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 13-13: NUM_RUNS appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 14-14: NUM_MINUTES appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 20-20: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

[error] 34-34: Double quote array expansions to avoid re-splitting elements.

(SC2068)

tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g-async-1off.sh

[warning] 6-6: NUM_NODES appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 9-9: NUM_RUNS appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 16-16: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

[error] 28-28: Double quote array expansions to avoid re-splitting elements.

(SC2068)

tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g.sh

[warning] 6-6: NUM_NODES appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 9-9: NUM_RUNS appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 16-16: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

[error] 28-28: Double quote array expansions to avoid re-splitting elements.

(SC2068)

tests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n8g.sh

[warning] 6-6: NUM_NODES appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 9-9: NUM_RUNS appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 16-16: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

[error] 28-28: Double quote array expansions to avoid re-splitting elements.

(SC2068)

tests/test_suites/llm/performance/grpo-qwen3-235b-32n8g-async-1off.sh

[warning] 8-8: NUM_NODES appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 11-11: NUM_RUNS appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 12-12: NUM_MINUTES appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 18-18: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

[error] 29-29: Double quote array expansions to avoid re-splitting elements.

(SC2068)

tests/test_suites/llm/performance/grpo-qwen3-32b-8n8g-async-1off.sh

[warning] 6-6: NUM_NODES appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 9-9: NUM_RUNS appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 16-16: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

[error] 28-28: Double quote array expansions to avoid re-splitting elements.

(SC2068)

tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g.sh

[warning] 6-6: NUM_NODES appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 9-9: NUM_RUNS appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 16-16: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

[error] 28-28: Double quote array expansions to avoid re-splitting elements.

(SC2068)

tests/test_suites/llm/performance/grpo-deepseek-v3-32n8g.sh

[warning] 10-10: NUM_NODES appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 13-13: NUM_RUNS appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 14-14: NUM_MINUTES appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 20-20: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

[error] 34-34: Double quote array expansions to avoid re-splitting elements.

(SC2068)

tests/test_suites/llm/performance/grpo-qwen3-235b-16n8g.sh

[warning] 8-8: NUM_NODES appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 11-11: NUM_RUNS appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 12-12: NUM_MINUTES appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 18-18: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

[error] 29-29: Double quote array expansions to avoid re-splitting elements.

(SC2068)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Lint check
GitHub Check: Lint check
GitHub Check: Lint check
GitHub Check: Post submodule check comment / Comment on PR
GitHub Check: Post automodel integration comment / Comment on PR

🔇 Additional comments (27)

coderabbitai · 2025-11-10T20:24:38Z

+  log_dir: logs/grpo-qwen3-235b-32n8g-16T16G-async-1off
+  wandb:
+    name: grpo-qwen3-235b-32n8g-16T16G-async-1off


⚠️ Potential issue | 🟠 Major

Fix inconsistent log directory and experiment name.

The log directory and WandB experiment name reference "16T16G" but this configuration is for a 32-node, 8-GPU-per-node cluster (32n8g). This appears to be a copy-paste error from the 16n8g variant.

Apply this diff:

logger: - log_dir: logs/grpo-qwen3-235b-32n8g-16T16G-async-1off + log_dir: logs/grpo-qwen3-235b-32n8g-async-1off wandb: - name: grpo-qwen3-235b-32n8g-16T16G-async-1off + name: grpo-qwen3-235b-32n8g-async-1off

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

log_dir: logs/grpo-qwen3-235b-32n8g-16T16G-async-1off

wandb:

name: grpo-qwen3-235b-32n8g-16T16G-async-1off

log_dir: logs/grpo-qwen3-235b-32n8g-async-1off

wandb:

name: grpo-qwen3-235b-32n8g-async-1off

🤖 Prompt for AI Agents

In examples/configs/recipes/llm/performance/grpo-qwen3-235b-32n8g-async-1off.yaml around lines 30-32, the log_dir and wandb.name still contain "16T16G" from the 16n8g copy; update both entries so they reference the correct 32n8g configuration (e.g., replace "16T16G" with "32T8G" so both log_dir and wandb.name become consistent with the 32n8g-async-1off run).

coderabbitai · 2025-11-10T20:24:39Z

+cd $PROJECT_ROOT
+uv run examples/run_grpo_math.py \
+    --config $CONFIG_PATH \
+    grpo.max_num_steps=$MAX_STEPS \
+    logger.log_dir=$LOG_DIR \
+    logger.wandb_enabled=True \
+    logger.wandb.project=nemo-rl \
+    logger.wandb.name=$EXP_NAME \
+    logger.monitor_gpus=True \
+    checkpointing.enabled=True \
+    checkpointing.checkpoint_dir=$CKPT_DIR \
+    $@ \
+    2>&1 | tee $RUN_LOG


⚠️ Potential issue | 🟠 Major

Add missing tensorboard_enabled configuration.

The script converts TensorBoard logs at line 33 but doesn't enable TensorBoard logging in the experiment configuration. This is inconsistent with other performance test scripts and may cause the log conversion to fail or produce empty results.

Apply this diff to enable TensorBoard logging:

logger.monitor_gpus=True \ + logger.tensorboard_enabled=True \ checkpointing.enabled=True \

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

cd $PROJECT_ROOT

uv run examples/run_grpo_math.py \

--config $CONFIG_PATH \

grpo.max_num_steps=$MAX_STEPS \

logger.log_dir=$LOG_DIR \

logger.wandb_enabled=True \

logger.wandb.project=nemo-rl \

logger.wandb.name=$EXP_NAME \

logger.monitor_gpus=True \

checkpointing.enabled=True \

checkpointing.checkpoint_dir=$CKPT_DIR \

$@ \

2>&1 | tee $RUN_LOG

cd $PROJECT_ROOT

uv run examples/run_grpo_math.py \

--config $CONFIG_PATH \

grpo.max_num_steps=$MAX_STEPS \

logger.log_dir=$LOG_DIR \

logger.wandb_enabled=True \

logger.wandb.project=nemo-rl \

logger.wandb.name=$EXP_NAME \

logger.monitor_gpus=True \

logger.tensorboard_enabled=True \

checkpointing.enabled=True \

checkpointing.checkpoint_dir=$CKPT_DIR \

$@ \

2>&1 | tee $RUN_LOG

🧰 Tools

🪛 Shellcheck (0.11.0)

[warning] 18-18: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

[error] 29-29: Double quote array expansions to avoid re-splitting elements.

(SC2068)

🤖 Prompt for AI Agents

In tests/test_suites/llm/performance/grpo-qwen3-235b-16n8g.sh around lines 18 to 30, the experiment CLI call enables WandB but does not enable TensorBoard, yet the script later converts TensorBoard logs; add the missing logger.tensorboard_enabled=True flag to the uv run arguments (alongside the existing logger.wandb.* and logger.monitor_gpus entries) so TensorBoard logging is actually enabled for the run.

chtruong814 requested review from a team as code owners November 10, 2025 20:18

chtruong814 requested a review from guyueh1 November 10, 2025 20:18

chtruong814 added cherry-pick Run CICD labels Nov 10, 2025

terrykong added the CI:docs Run doctest label Nov 10, 2025

terrykong enabled auto-merge (squash) November 10, 2025 20:21

terrykong temporarily deployed to nemo-ci November 10, 2025 20:21 — with GitHub Actions Inactive

terrykong approved these changes Nov 10, 2025

View reviewed changes

coderabbitai Bot reviewed Nov 10, 2025

View reviewed changes

terrykong temporarily deployed to nemo-ci November 10, 2025 20:39 — with GitHub Actions Inactive

terrykong merged commit c7c6b1d into r0.4.0 Nov 10, 2025
66 of 69 checks passed

terrykong deleted the cherry-pick-1322-r0.4.0 branch November 10, 2025 20:50

This was referenced Dec 11, 2025

test: Add grpo-qwen3-30ba3b-4n8g-40k config to performance test suite. #1623

Merged

perf: Add qwen3 30b-a3b async-8-off recipe #1642

Merged

This was referenced Dec 18, 2025

Unable to run multi-node scripts using Enroot images #1657

Closed

test: Perf recipe for v0.5 #1667

Merged

coderabbitai Bot mentioned this pull request Jan 6, 2026

fix: use median instead of mean for logprob error for stability in nightlies #1722

Merged

4 tasks

coderabbitai Bot mentioned this pull request Jan 13, 2026

docs: V0.5 perf results #1771

Closed

4 tasks

coderabbitai Bot mentioned this pull request Feb 18, 2026

feat: support GDPO #1986

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cp: `feat: Onboard perf recipes in tests (1322)` into `r0.4.0`#1497

cp: `feat: Onboard perf recipes in tests (1322)` into `r0.4.0`#1497
terrykong merged 1 commit intor0.4.0from
cherry-pick-1322-r0.4.0

chtruong814 commented Nov 10, 2025 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Nov 10, 2025

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Nov 10, 2025

Uh oh!

coderabbitai Bot Nov 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

chtruong814 commented Nov 10, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Nov 10, 2025

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chtruong814 commented Nov 10, 2025 •

edited by coderabbitai Bot

Loading