feat: Onboard perf recipes in tests by guyueh1 · Pull Request #1322 · NVIDIA-NeMo/RL

guyueh1 · 2025-10-08T23:48:26Z

What does this PR do ?

Onboard some recipes for short perf testing in CI

Issues

List issues that this PR closes (syntax):

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

Summary by CodeRabbit

New Features
- Added GRPO performance testing configurations for multiple large language models including DeepSeek v3, Llama 3.1/3.3, and Qwen3 series.
- Introduced automated performance test scripts supporting single-node and multi-node distributed training setups.
- Added integrated logging with WandB and TensorBoard, GPU monitoring, and metric validation capabilities.

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

guyueh1 · 2025-10-13T23:14:18Z

@youngeunkwon0405 please refer to this PR and add perf recipe+script for the two async benchmarks as well

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

…nto guyueh/perf_tests

coderabbitai · 2025-10-21T20:16:26Z

📝 Walkthrough

Walkthrough

This PR introduces a comprehensive GRPO performance testing infrastructure with configuration files for multiple LLM models (DeepSeek, Llama, Qwen variants) across various hardware configurations, corresponding test execution scripts, and shared test utilities.

Changes

Cohort / File(s)	Summary
GRPO Configuration Files - DeepSeek `examples/configs/recipes/llm/performance/grpo-deepseek-v3-32n8g.yaml`	Configuration for DeepSeek v3 32-way deployment with GRPO parameters, Megatron-LM distributed training settings, FP8 configurations, and cluster specifications.
GRPO Configuration Files - Llama 3.1 `examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-1n8g.yaml`, `grpo-llama3.1-8b-instruct-2n8g.yaml`	Configurations for Llama 3.1 8B Instruct with 1-node and 2-node setups, including Megatron parallelism settings, importance sampling loss, and validation configurations.
GRPO Configuration Files - Llama 3.3 `examples/configs/recipes/llm/performance/grpo-llama3.3-70b-instruct-4n8g.yaml`, `grpo-llama3.3-70b-instruct-4n8g-16k.yaml`	Configurations for Llama 3.3 70B Instruct with 4-node deployments, supporting 4096 and 16384 sequence lengths with tensor/pipeline/context parallelism.
GRPO Configuration Files - Qwen3 `examples/configs/recipes/llm/performance/grpo-qwen3-235b-16n8g.yaml`, `grpo-qwen3-235b-32n8g.yaml`, `grpo-qwen3-30ba3b-4n8g.yaml`, `grpo-qwen3-32b-4n8g.yaml`, `grpo-qwen3-32b-4n8g-fp8.yaml`, `grpo-qwen3-32b-4n8g-fp8-16k.yaml`	Configurations for various Qwen3 model variants with distributed parallelism, including FP8-specific configurations with blockwise quantization and deep GEMM optimizations.
Test Infrastructure - Common Utilities `tests/test_suites/llm/performance/common.env`	Shared environment setup script defining functions for early exit logic, path translations, directory creation, and Python path configuration for performance tests.
Test Execution Scripts `tests/test_suites/llm/performance/grpo-*.sh`	Bash scripts for each configuration variant orchestrating experiment execution via `uv run`, log conversion from TensorBoard to JSON, and conditional metric validation with jq and Python checkers.
Test Suite Registry `tests/test_suites/performance.txt`	Index file listing GRPO performance test script paths for test discovery and execution.

Sequence Diagram(s)

sequenceDiagram
    participant Test Script as Test Script<br/>(grpo-*.sh)
    participant Env as common.env
    participant Runner as uv run<br/>examples/run_grpo_math.py
    participant TBoard as TensorBoard Logs
    participant Converter as json_dump_tb_logs.py
    participant Metrics as JSON Metrics
    participant Validator as check_metrics.py

    Test Script->>Env: source common.env
    Env->>Env: validate config exists<br/>setup directories<br/>translate paths
    Env-->>Test Script: environment ready

    Test Script->>Runner: execute with config<br/>+ logging + checkpointing
    Runner->>TBoard: generate logs
    Runner-->>Test Script: experiment complete

    Test Script->>Converter: convert logs to JSON
    Converter->>Metrics: write metrics
    Converter-->>Test Script: conversion done

    alt max_steps reached
        Test Script->>Validator: check constraints
        Validator->>Metrics: read train/token_mult_prob_error
        Validator-->>Test Script: validation result
    else max_steps not reached
        Test Script-->>Test Script: skip validation
    end

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

The changes are primarily configuration and shell script additions following established patterns. While there is significant file volume (23+ files), they are homogeneous—each YAML config and test script shares a consistent structure with minimal logic. The variations are primarily in hyperparameter values, model names, and cluster configurations rather than architectural differences.

Possibly related PRs

feat: FP8 Training in Megatron Path #971: Megatron FP8 training enhancements directly correlate with the FP8-specific GRPO configurations being added (e.g., grpo-qwen3-32b-4n8g-fp8.yaml, grpo-qwen3-32b-4n8g-fp8-16k.yaml), which configure FP8 quantization, blockwise settings, and related parameters.

Suggested labels

CI:L1, Run CICD

Suggested reviewers

terrykong
yuki-97
parthchadha

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The PR title "feat: Onboard perf recipes in tests" directly and accurately describes the primary change in the changeset. The PR adds a comprehensive set of performance recipe configuration files (.yaml) for various LLM models (DeepSeek, Llama, Qwen) along with corresponding test scripts and environment setup files to the tests directory. The term "perf recipes" clearly maps to the performance configuration files being added, and "in tests" accurately reflects that these are being integrated into the test infrastructure. The title is concise, specific, and avoids vague terminology while capturing the main objective without requiring coverage of every implementation detail.
Test Results For Major Changes	✅ Passed	This PR adds configuration files and test scripts for performance testing of GRPO recipes across various models and hardware configurations. These are configuration and test infrastructure additions, not modifications to core codebase functionality, APIs, or product features. According to the check criteria, test results documentation is required only for major changes such as new features, breaking changes, or significant refactoring. Since these are minor configuration and test infrastructure additions that do not affect core functionality or introduce breaking changes, the check passes regardless of explicit test result documentation.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch guyueh/perf_tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 26

🧹 Nitpick comments (5)

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0a769cc and d0d54c3.

📒 Files selected for processing (24)

examples/configs/recipes/llm/performance/grpo-deepseek-v3-32n8g.yaml (1 hunks)
examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-1n8g.yaml (1 hunks)
examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g.yaml (1 hunks)
examples/configs/recipes/llm/performance/grpo-llama3.3-70b-instruct-4n8g-16k.yaml (1 hunks)
examples/configs/recipes/llm/performance/grpo-llama3.3-70b-instruct-4n8g.yaml (1 hunks)
examples/configs/recipes/llm/performance/grpo-qwen3-235b-16n8g.yaml (1 hunks)
examples/configs/recipes/llm/performance/grpo-qwen3-235b-32n8g.yaml (1 hunks)
examples/configs/recipes/llm/performance/grpo-qwen3-30ba3b-4n8g.yaml (1 hunks)
examples/configs/recipes/llm/performance/grpo-qwen3-32b-4n8g-fp8-16k.yaml (1 hunks)
examples/configs/recipes/llm/performance/grpo-qwen3-32b-4n8g-fp8.yaml (1 hunks)
examples/configs/recipes/llm/performance/grpo-qwen3-32b-4n8g.yaml (1 hunks)
tests/test_suites/llm/performance/common.env (1 hunks)
tests/test_suites/llm/performance/grpo-deepseek-v3-32n8g.sh (1 hunks)
tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-1n8g.sh (1 hunks)
tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g.sh (1 hunks)
tests/test_suites/llm/performance/grpo-llama3.3-70b-instruct-4n8g-16k.sh (1 hunks)
tests/test_suites/llm/performance/grpo-llama3.3-70b-instruct-4n8g.sh (1 hunks)
tests/test_suites/llm/performance/grpo-qwen3-235b-16n8g.sh (1 hunks)
tests/test_suites/llm/performance/grpo-qwen3-235b-32n8g.sh (1 hunks)
tests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n8g.sh (1 hunks)
tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g-fp8-16k.sh (1 hunks)
tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g-fp8.sh (1 hunks)
tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g.sh (1 hunks)
tests/test_suites/performance.txt (1 hunks)

🧰 Additional context used

📓 Path-based instructions (5)

examples/configs/recipes/**/*.yaml

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Files:

examples/configs/recipes/llm/performance/grpo-llama3.3-70b-instruct-4n8g.yaml
examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-1n8g.yaml
examples/configs/recipes/llm/performance/grpo-qwen3-32b-4n8g-fp8.yaml
examples/configs/recipes/llm/performance/grpo-qwen3-32b-4n8g-fp8-16k.yaml
examples/configs/recipes/llm/performance/grpo-qwen3-32b-4n8g.yaml
examples/configs/recipes/llm/performance/grpo-qwen3-235b-32n8g.yaml
examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g.yaml
examples/configs/recipes/llm/performance/grpo-qwen3-235b-16n8g.yaml
examples/configs/recipes/llm/performance/grpo-qwen3-30ba3b-4n8g.yaml
examples/configs/recipes/llm/performance/grpo-deepseek-v3-32n8g.yaml
examples/configs/recipes/llm/performance/grpo-llama3.3-70b-instruct-4n8g-16k.yaml

examples/configs/recipes/**/*.{yaml,sh}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Files:

examples/configs/recipes/llm/performance/grpo-llama3.3-70b-instruct-4n8g.yaml
examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-1n8g.yaml
examples/configs/recipes/llm/performance/grpo-qwen3-32b-4n8g-fp8.yaml
examples/configs/recipes/llm/performance/grpo-qwen3-32b-4n8g-fp8-16k.yaml
examples/configs/recipes/llm/performance/grpo-qwen3-32b-4n8g.yaml
examples/configs/recipes/llm/performance/grpo-qwen3-235b-32n8g.yaml
examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g.yaml
examples/configs/recipes/llm/performance/grpo-qwen3-235b-16n8g.yaml
examples/configs/recipes/llm/performance/grpo-qwen3-30ba3b-4n8g.yaml
examples/configs/recipes/llm/performance/grpo-deepseek-v3-32n8g.yaml
examples/configs/recipes/llm/performance/grpo-llama3.3-70b-instruct-4n8g-16k.yaml

examples/configs/recipes/**

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Files:

examples/configs/recipes/llm/performance/grpo-llama3.3-70b-instruct-4n8g.yaml
examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-1n8g.yaml
examples/configs/recipes/llm/performance/grpo-qwen3-32b-4n8g-fp8.yaml
examples/configs/recipes/llm/performance/grpo-qwen3-32b-4n8g-fp8-16k.yaml
examples/configs/recipes/llm/performance/grpo-qwen3-32b-4n8g.yaml
examples/configs/recipes/llm/performance/grpo-qwen3-235b-32n8g.yaml
examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g.yaml
examples/configs/recipes/llm/performance/grpo-qwen3-235b-16n8g.yaml
examples/configs/recipes/llm/performance/grpo-qwen3-30ba3b-4n8g.yaml
examples/configs/recipes/llm/performance/grpo-deepseek-v3-32n8g.yaml
examples/configs/recipes/llm/performance/grpo-llama3.3-70b-instruct-4n8g-16k.yaml

**/*.sh

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Files:

tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g-fp8.sh
tests/test_suites/llm/performance/grpo-deepseek-v3-32n8g.sh
tests/test_suites/llm/performance/grpo-qwen3-235b-32n8g.sh
tests/test_suites/llm/performance/grpo-qwen3-235b-16n8g.sh
tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g.sh
tests/test_suites/llm/performance/grpo-llama3.3-70b-instruct-4n8g-16k.sh
tests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n8g.sh
tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-1n8g.sh
tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g.sh
tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g-fp8-16k.sh
tests/test_suites/llm/performance/grpo-llama3.3-70b-instruct-4n8g.sh

tests/test_suites/**

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Files:

tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g-fp8.sh
tests/test_suites/llm/performance/grpo-deepseek-v3-32n8g.sh
tests/test_suites/llm/performance/grpo-qwen3-235b-32n8g.sh
tests/test_suites/llm/performance/grpo-qwen3-235b-16n8g.sh
tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g.sh
tests/test_suites/llm/performance/grpo-llama3.3-70b-instruct-4n8g-16k.sh
tests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n8g.sh
tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-1n8g.sh
tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g.sh
tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g-fp8-16k.sh
tests/test_suites/llm/performance/grpo-llama3.3-70b-instruct-4n8g.sh
tests/test_suites/llm/performance/common.env
tests/test_suites/performance.txt

🧠 Learnings (1)

📚 Learning: 2025-10-12T14:46:57.171Z

Learnt from: zpqiu
PR: NVIDIA-NeMo/RL#1324
File: tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh:6-11
Timestamp: 2025-10-12T14:46:57.171Z
Learning: Test scripts in tests/test_suites/llm/ follow a standard configuration pattern that includes NUM_NODES, STEPS_PER_RUN, MAX_STEPS, NUM_RUNS (calculated as `$(( (MAX_STEPS + STEPS_PER_RUN - 1) / STEPS_PER_RUN ))`), and NUM_MINUTES. These variables are part of the test infrastructure's standard interface and should not be flagged as unused even if not directly referenced within the individual script, as they are consumed by external launch tooling or common.env.

Applied to files:

tests/test_suites/llm/performance/common.env

🧬 Code graph analysis (8)

tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g-fp8.sh (1)

tests/test_suites/llm/performance/grpo-deepseek-v3-32n8g.sh (1)

tests/test_suites/llm/performance/grpo-qwen3-235b-16n8g.sh (1)

tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g.sh (1)

tests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n8g.sh (1)

tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-1n8g.sh (1)

tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g.sh (1)

tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g-fp8-16k.sh (1)

🪛 Shellcheck (0.11.0)

tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g-fp8.sh

[warning] 6-6: NUM_NODES appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 9-9: NUM_RUNS appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 16-16: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

[error] 28-28: Double quote array expansions to avoid re-splitting elements.

(SC2068)

tests/test_suites/llm/performance/grpo-deepseek-v3-32n8g.sh

[warning] 10-10: NUM_NODES appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 13-13: NUM_RUNS appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 14-14: NUM_MINUTES appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 20-20: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

[error] 34-34: Double quote array expansions to avoid re-splitting elements.

(SC2068)

tests/test_suites/llm/performance/grpo-qwen3-235b-32n8g.sh

[warning] 8-8: NUM_NODES appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 11-11: NUM_RUNS appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 12-12: NUM_MINUTES appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 18-18: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

[error] 29-29: Double quote array expansions to avoid re-splitting elements.

(SC2068)

tests/test_suites/llm/performance/grpo-qwen3-235b-16n8g.sh

[warning] 8-8: NUM_NODES appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 11-11: NUM_RUNS appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 12-12: NUM_MINUTES appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 18-18: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

[error] 29-29: Double quote array expansions to avoid re-splitting elements.

(SC2068)

tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g.sh

[warning] 6-6: NUM_NODES appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 9-9: NUM_RUNS appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 16-16: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

[error] 28-28: Double quote array expansions to avoid re-splitting elements.

(SC2068)

tests/test_suites/llm/performance/grpo-llama3.3-70b-instruct-4n8g-16k.sh

[warning] 6-6: NUM_NODES appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 9-9: NUM_RUNS appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 16-16: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

[error] 28-28: Double quote array expansions to avoid re-splitting elements.

(SC2068)

tests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n8g.sh

[warning] 6-6: NUM_NODES appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 9-9: NUM_RUNS appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 16-16: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

[error] 28-28: Double quote array expansions to avoid re-splitting elements.

(SC2068)

tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-1n8g.sh

[warning] 6-6: NUM_NODES appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 9-9: NUM_RUNS appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 16-16: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

[error] 28-28: Double quote array expansions to avoid re-splitting elements.

(SC2068)

tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g.sh

[warning] 6-6: NUM_NODES appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 9-9: NUM_RUNS appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 16-16: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

[error] 28-28: Double quote array expansions to avoid re-splitting elements.

(SC2068)

tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g-fp8-16k.sh

[warning] 6-6: NUM_NODES appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 9-9: NUM_RUNS appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 16-16: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

[error] 28-28: Double quote array expansions to avoid re-splitting elements.

(SC2068)

tests/test_suites/llm/performance/grpo-llama3.3-70b-instruct-4n8g.sh

[warning] 6-6: NUM_NODES appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 9-9: NUM_RUNS appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).

(SC2034)

[warning] 16-16: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)

[error] 28-28: Double quote array expansions to avoid re-splitting elements.

(SC2068)

🔇 Additional comments (8)

Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com> Signed-off-by: Zhiyu Li <zhiyul@nvidia.com> Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com> Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: Anna Shors <ashors@nvidia.com> Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com> Signed-off-by: Parth Chadha <pchadha@nvidia.com> Co-authored-by: Zhiyu Li <zhiyul@NVIDIA.com> Co-authored-by: Terry Kong <terrycurtiskong@gmail.com> Co-authored-by: Anna Shors <ashors@nvidia.com> Co-authored-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com> Co-authored-by: Guyue Huang <guyueh@nvidia.com> Co-authored-by: Parth Chadha <pchadha@nvidia.com> Co-authored-by: Terry Kong <terryk@nvidia.com>

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

guyueh1 · 2025-11-10T17:34:55Z

@terrykong L0 CI is passing, should we run more CI?

Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com> Signed-off-by: Zhiyu Li <zhiyul@nvidia.com> Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com> Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: Anna Shors <ashors@nvidia.com> Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com> Signed-off-by: Parth Chadha <pchadha@nvidia.com> Signed-off-by: Guyue Huang <140554423+guyueh1@users.noreply.github.com> Co-authored-by: Youngeun Kwon <youngeunk@nvidia.com> Co-authored-by: Zhiyu Li <zhiyul@NVIDIA.com> Co-authored-by: Terry Kong <terrycurtiskong@gmail.com> Co-authored-by: Anna Shors <ashors@nvidia.com> Co-authored-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com> Co-authored-by: Parth Chadha <pchadha@nvidia.com> Co-authored-by: Terry Kong <terryk@nvidia.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>

Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com> Signed-off-by: Zhiyu Li <zhiyul@nvidia.com> Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com> Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: Anna Shors <ashors@nvidia.com> Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com> Signed-off-by: Parth Chadha <pchadha@nvidia.com> Signed-off-by: Guyue Huang <140554423+guyueh1@users.noreply.github.com> Co-authored-by: Youngeun Kwon <youngeunk@nvidia.com> Co-authored-by: Zhiyu Li <zhiyul@NVIDIA.com> Co-authored-by: Terry Kong <terrycurtiskong@gmail.com> Co-authored-by: Anna Shors <ashors@nvidia.com> Co-authored-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com> Co-authored-by: Parth Chadha <pchadha@nvidia.com> Co-authored-by: Terry Kong <terryk@nvidia.com> Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com>

Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com> Signed-off-by: Zhiyu Li <zhiyul@nvidia.com> Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com> Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: Anna Shors <ashors@nvidia.com> Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com> Signed-off-by: Parth Chadha <pchadha@nvidia.com> Signed-off-by: Guyue Huang <140554423+guyueh1@users.noreply.github.com> Co-authored-by: Youngeun Kwon <youngeunk@nvidia.com> Co-authored-by: Zhiyu Li <zhiyul@NVIDIA.com> Co-authored-by: Terry Kong <terrycurtiskong@gmail.com> Co-authored-by: Anna Shors <ashors@nvidia.com> Co-authored-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com> Co-authored-by: Parth Chadha <pchadha@nvidia.com> Co-authored-by: Terry Kong <terryk@nvidia.com>

Signed-off-by: Guyue Huang <guyueh@nvidia.com> Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com> Signed-off-by: Zhiyu Li <zhiyul@nvidia.com> Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com> Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: Anna Shors <ashors@nvidia.com> Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com> Signed-off-by: Parth Chadha <pchadha@nvidia.com> Signed-off-by: Guyue Huang <140554423+guyueh1@users.noreply.github.com> Co-authored-by: Youngeun Kwon <youngeunk@nvidia.com> Co-authored-by: Zhiyu Li <zhiyul@NVIDIA.com> Co-authored-by: Terry Kong <terrycurtiskong@gmail.com> Co-authored-by: Anna Shors <ashors@nvidia.com> Co-authored-by: Yuki Huang <yukih@nvidia.com> Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com> Co-authored-by: Parth Chadha <pchadha@nvidia.com> Co-authored-by: Terry Kong <terryk@nvidia.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>

guyueh1 added 2 commits October 8, 2025 16:46

Attempt to onboard these perf tests

826e9d8

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

fix recipe

a69d960

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

guyueh1 and others added 4 commits October 15, 2025 10:24

Fix 235b

9240afb

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

Merge branch 'main' into guyueh/perf_tests

14ac6f3

Add tests for 70b and 235b at 32n

4987695

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

Merge branch 'guyueh/perf_tests' of ssh://github.com/NVIDIA-NeMo/RL i…

d0d54c3

…nto guyueh/perf_tests

guyueh1 marked this pull request as ready for review October 21, 2025 20:09

guyueh1 requested review from a team as code owners October 21, 2025 20:09

coderabbitai Bot reviewed Oct 21, 2025

View reviewed changes

youngeunkwon0405 reviewed Oct 22, 2025

View reviewed changes

Comment thread examples/configs/recipes/llm/performance/grpo-deepseek-v3-32n8g.yaml

youngeunkwon0405 reviewed Oct 22, 2025

View reviewed changes

Comment thread examples/configs/recipes/llm/performance/grpo-deepseek-v3-32n8g.yaml Outdated

guyueh1 and others added 5 commits October 27, 2025 11:25

Merge branch 'main' into guyueh/perf_tests

0eefea3

feat: async performance recipe (under test) (#1414)

f14b5bf

Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>

Fix stop token id in some configs and deepseek gbs

9a65398

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

update performance.txt

2f184ee

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

guyueh1 requested review from a team as code owners November 3, 2025 22:17

github-actions Bot added the Documentation Improvements or additions to documentation label Nov 3, 2025

guyueh1 added CI:L0 Run doctests and unit tests and removed CI:L0 Run doctests and unit tests labels Nov 8, 2025

guyueh1 temporarily deployed to nemo-ci November 8, 2025 00:19 — with GitHub Actions Inactive

guyueh1 temporarily deployed to nemo-ci November 8, 2025 00:20 — with GitHub Actions Inactive

Fix import config

b1a3b10

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

guyueh1 added CI:L0 Run doctests and unit tests and removed CI:L0 Run doctests and unit tests labels Nov 9, 2025

guyueh1 temporarily deployed to nemo-ci November 9, 2025 18:41 — with GitHub Actions Inactive

guyueh1 temporarily deployed to nemo-ci November 9, 2025 19:04 — with GitHub Actions Inactive

Fix unit test

ef51083

Signed-off-by: Guyue Huang <guyueh@nvidia.com>

guyueh1 added CI:L0 Run doctests and unit tests and removed CI:L0 Run doctests and unit tests labels Nov 9, 2025

guyueh1 temporarily deployed to nemo-ci November 9, 2025 22:12 — with GitHub Actions Inactive

guyueh1 temporarily deployed to nemo-ci November 9, 2025 22:32 — with GitHub Actions Inactive

terrykong enabled auto-merge (squash) November 10, 2025 20:17

terrykong approved these changes Nov 10, 2025

View reviewed changes

terrykong merged commit 3350ba2 into main Nov 10, 2025
41 of 42 checks passed

terrykong deleted the guyueh/perf_tests branch November 10, 2025 20:18

coderabbitai Bot mentioned this pull request Nov 10, 2025

cp: feat: Onboard perf recipes in tests (1322) into r0.4.0 #1497

Merged

coderabbitai Bot mentioned this pull request Dec 11, 2025

test: Add grpo-qwen3-30ba3b-4n8g-40k config to performance test suite. #1623

Merged

4 tasks

This was referenced Dec 18, 2025

Unable to run multi-node scripts using Enroot images #1657

Closed

test: Perf recipe for v0.5 #1661

Closed

cp: test: Perf recipe for v0.5 (1667) into r0.5.0 #1671

Merged

coderabbitai Bot mentioned this pull request Jan 6, 2026

fix: use median instead of mean for logprob error for stability in nightlies #1722

Merged

4 tasks

Conversation

guyueh1 commented Oct 8, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

guyueh1 commented Oct 13, 2025

Uh oh!

coderabbitai Bot commented Oct 21, 2025

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

guyueh1 commented Nov 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

guyueh1 commented Oct 8, 2025 •

edited by coderabbitai Bot

Loading