Skip to content

feat: Onboard perf recipes in tests#1322

Merged
terrykong merged 20 commits intomainfrom
guyueh/perf_tests
Nov 10, 2025
Merged

feat: Onboard perf recipes in tests#1322
terrykong merged 20 commits intomainfrom
guyueh/perf_tests

Conversation

@guyueh1
Copy link
Copy Markdown
Contributor

@guyueh1 guyueh1 commented Oct 8, 2025

What does this PR do ?

Onboard some recipes for short perf testing in CI

Issues

List issues that this PR closes (syntax):

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

Summary by CodeRabbit

  • New Features
    • Added GRPO performance testing configurations for multiple large language models including DeepSeek v3, Llama 3.1/3.3, and Qwen3 series.
    • Introduced automated performance test scripts supporting single-node and multi-node distributed training setups.
    • Added integrated logging with WandB and TensorBoard, GPU monitoring, and metric validation capabilities.

Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
@guyueh1
Copy link
Copy Markdown
Contributor Author

guyueh1 commented Oct 13, 2025

@youngeunkwon0405 please refer to this PR and add perf recipe+script for the two async benchmarks as well

guyueh1 and others added 4 commits October 15, 2025 10:24
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
@guyueh1 guyueh1 marked this pull request as ready for review October 21, 2025 20:09
@guyueh1 guyueh1 requested review from a team as code owners October 21, 2025 20:09
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Oct 21, 2025

📝 Walkthrough

Walkthrough

This PR introduces a comprehensive GRPO performance testing infrastructure with configuration files for multiple LLM models (DeepSeek, Llama, Qwen variants) across various hardware configurations, corresponding test execution scripts, and shared test utilities.

Changes

Cohort / File(s) Summary
GRPO Configuration Files - DeepSeek
examples/configs/recipes/llm/performance/grpo-deepseek-v3-32n8g.yaml
Configuration for DeepSeek v3 32-way deployment with GRPO parameters, Megatron-LM distributed training settings, FP8 configurations, and cluster specifications.
GRPO Configuration Files - Llama 3.1
examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-1n8g.yaml, grpo-llama3.1-8b-instruct-2n8g.yaml
Configurations for Llama 3.1 8B Instruct with 1-node and 2-node setups, including Megatron parallelism settings, importance sampling loss, and validation configurations.
GRPO Configuration Files - Llama 3.3
examples/configs/recipes/llm/performance/grpo-llama3.3-70b-instruct-4n8g.yaml, grpo-llama3.3-70b-instruct-4n8g-16k.yaml
Configurations for Llama 3.3 70B Instruct with 4-node deployments, supporting 4096 and 16384 sequence lengths with tensor/pipeline/context parallelism.
GRPO Configuration Files - Qwen3
examples/configs/recipes/llm/performance/grpo-qwen3-235b-16n8g.yaml, grpo-qwen3-235b-32n8g.yaml, grpo-qwen3-30ba3b-4n8g.yaml, grpo-qwen3-32b-4n8g.yaml, grpo-qwen3-32b-4n8g-fp8.yaml, grpo-qwen3-32b-4n8g-fp8-16k.yaml
Configurations for various Qwen3 model variants with distributed parallelism, including FP8-specific configurations with blockwise quantization and deep GEMM optimizations.
Test Infrastructure - Common Utilities
tests/test_suites/llm/performance/common.env
Shared environment setup script defining functions for early exit logic, path translations, directory creation, and Python path configuration for performance tests.
Test Execution Scripts
tests/test_suites/llm/performance/grpo-*.sh
Bash scripts for each configuration variant orchestrating experiment execution via uv run, log conversion from TensorBoard to JSON, and conditional metric validation with jq and Python checkers.
Test Suite Registry
tests/test_suites/performance.txt
Index file listing GRPO performance test script paths for test discovery and execution.

Sequence Diagram(s)

sequenceDiagram
    participant Test Script as Test Script<br/>(grpo-*.sh)
    participant Env as common.env
    participant Runner as uv run<br/>examples/run_grpo_math.py
    participant TBoard as TensorBoard Logs
    participant Converter as json_dump_tb_logs.py
    participant Metrics as JSON Metrics
    participant Validator as check_metrics.py

    Test Script->>Env: source common.env
    Env->>Env: validate config exists<br/>setup directories<br/>translate paths
    Env-->>Test Script: environment ready

    Test Script->>Runner: execute with config<br/>+ logging + checkpointing
    Runner->>TBoard: generate logs
    Runner-->>Test Script: experiment complete

    Test Script->>Converter: convert logs to JSON
    Converter->>Metrics: write metrics
    Converter-->>Test Script: conversion done

    alt max_steps reached
        Test Script->>Validator: check constraints
        Validator->>Metrics: read train/token_mult_prob_error
        Validator-->>Test Script: validation result
    else max_steps not reached
        Test Script-->>Test Script: skip validation
    end
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

The changes are primarily configuration and shell script additions following established patterns. While there is significant file volume (23+ files), they are homogeneous—each YAML config and test script shares a consistent structure with minimal logic. The variations are primarily in hyperparameter values, model names, and cluster configurations rather than architectural differences.

Possibly related PRs

  • feat: FP8 Training in Megatron Path #971: Megatron FP8 training enhancements directly correlate with the FP8-specific GRPO configurations being added (e.g., grpo-qwen3-32b-4n8g-fp8.yaml, grpo-qwen3-32b-4n8g-fp8-16k.yaml), which configure FP8 quantization, blockwise settings, and related parameters.

Suggested labels

CI:L1, Run CICD

Suggested reviewers

  • terrykong
  • yuki-97
  • parthchadha

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The PR title "feat: Onboard perf recipes in tests" directly and accurately describes the primary change in the changeset. The PR adds a comprehensive set of performance recipe configuration files (.yaml) for various LLM models (DeepSeek, Llama, Qwen) along with corresponding test scripts and environment setup files to the tests directory. The term "perf recipes" clearly maps to the performance configuration files being added, and "in tests" accurately reflects that these are being integrated into the test infrastructure. The title is concise, specific, and avoids vague terminology while capturing the main objective without requiring coverage of every implementation detail.
Test Results For Major Changes ✅ Passed This PR adds configuration files and test scripts for performance testing of GRPO recipes across various models and hardware configurations. These are configuration and test infrastructure additions, not modifications to core codebase functionality, APIs, or product features. According to the check criteria, test results documentation is required only for major changes such as new features, breaking changes, or significant refactoring. Since these are minor configuration and test infrastructure additions that do not affect core functionality or introduce breaking changes, the check passes regardless of explicit test result documentation.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch guyueh/perf_tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 26

🧹 Nitpick comments (5)
tests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n8g.sh (1)

6-12: Document or remove unused configuration variables.

The variables NUM_NODES, NUM_RUNS, and NUM_MINUTES are defined but not referenced in the script. If these are intentional (e.g., for documentation or future use), add a comment explaining their purpose.

tests/test_suites/llm/performance/grpo-deepseek-v3-32n8g.sh (1)

10-14: Document or remove unused configuration variables.

The variables NUM_NODES, NUM_RUNS, and NUM_MINUTES are defined but not referenced in the script. If these are for documentation or future use, add a comment explaining their purpose.

examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-1n8g.yaml (1)

17-17: Replace make_sequence_length_divisible_by: 1 with a meaningful divisor.

Dividing by 1 is a no-op. Consider using a value that aligns with the tensor parallelism factor (e.g., 2 to match tensor_model_parallel_size), or remove this setting if not needed.

tests/test_suites/llm/performance/grpo-qwen3-235b-16n8g.sh (1)

8-12: Document or remove unused configuration variables.

The variables NUM_NODES, NUM_RUNS, and NUM_MINUTES are defined but not referenced in the script. If these are for documentation or future use, add a comment explaining their purpose.

tests/test_suites/llm/performance/grpo-qwen3-235b-32n8g.sh (1)

3-3: Quote variable expansions for robustness.

Line 3 sources the environment file without quoting. While typically safe, it's better practice to quote variable expansions to handle paths with spaces or special characters.

-source $SCRIPT_DIR/common.env
+source "$SCRIPT_DIR/common.env"
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0a769cc and d0d54c3.

📒 Files selected for processing (24)
  • examples/configs/recipes/llm/performance/grpo-deepseek-v3-32n8g.yaml (1 hunks)
  • examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-1n8g.yaml (1 hunks)
  • examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g.yaml (1 hunks)
  • examples/configs/recipes/llm/performance/grpo-llama3.3-70b-instruct-4n8g-16k.yaml (1 hunks)
  • examples/configs/recipes/llm/performance/grpo-llama3.3-70b-instruct-4n8g.yaml (1 hunks)
  • examples/configs/recipes/llm/performance/grpo-qwen3-235b-16n8g.yaml (1 hunks)
  • examples/configs/recipes/llm/performance/grpo-qwen3-235b-32n8g.yaml (1 hunks)
  • examples/configs/recipes/llm/performance/grpo-qwen3-30ba3b-4n8g.yaml (1 hunks)
  • examples/configs/recipes/llm/performance/grpo-qwen3-32b-4n8g-fp8-16k.yaml (1 hunks)
  • examples/configs/recipes/llm/performance/grpo-qwen3-32b-4n8g-fp8.yaml (1 hunks)
  • examples/configs/recipes/llm/performance/grpo-qwen3-32b-4n8g.yaml (1 hunks)
  • tests/test_suites/llm/performance/common.env (1 hunks)
  • tests/test_suites/llm/performance/grpo-deepseek-v3-32n8g.sh (1 hunks)
  • tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-1n8g.sh (1 hunks)
  • tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g.sh (1 hunks)
  • tests/test_suites/llm/performance/grpo-llama3.3-70b-instruct-4n8g-16k.sh (1 hunks)
  • tests/test_suites/llm/performance/grpo-llama3.3-70b-instruct-4n8g.sh (1 hunks)
  • tests/test_suites/llm/performance/grpo-qwen3-235b-16n8g.sh (1 hunks)
  • tests/test_suites/llm/performance/grpo-qwen3-235b-32n8g.sh (1 hunks)
  • tests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n8g.sh (1 hunks)
  • tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g-fp8-16k.sh (1 hunks)
  • tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g-fp8.sh (1 hunks)
  • tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g.sh (1 hunks)
  • tests/test_suites/performance.txt (1 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
examples/configs/recipes/**/*.yaml

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

examples/configs/recipes/**/*.yaml: Recipe YAMLs under examples/configs/recipes/** are runnable snapshots and may omit documentation
When adding support for a new model, add a recipe YAML under examples/configs/recipes/ in the appropriate domain (llm/ or vlm/) with the correct name

Files:

  • examples/configs/recipes/llm/performance/grpo-llama3.3-70b-instruct-4n8g.yaml
  • examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-1n8g.yaml
  • examples/configs/recipes/llm/performance/grpo-qwen3-32b-4n8g-fp8.yaml
  • examples/configs/recipes/llm/performance/grpo-qwen3-32b-4n8g-fp8-16k.yaml
  • examples/configs/recipes/llm/performance/grpo-qwen3-32b-4n8g.yaml
  • examples/configs/recipes/llm/performance/grpo-qwen3-235b-32n8g.yaml
  • examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g.yaml
  • examples/configs/recipes/llm/performance/grpo-qwen3-235b-16n8g.yaml
  • examples/configs/recipes/llm/performance/grpo-qwen3-30ba3b-4n8g.yaml
  • examples/configs/recipes/llm/performance/grpo-deepseek-v3-32n8g.yaml
  • examples/configs/recipes/llm/performance/grpo-llama3.3-70b-instruct-4n8g-16k.yaml
examples/configs/recipes/**/*.{yaml,sh}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Known exception: Deepscaler recipes may encode context length in place of the cluster tuple (e.g., grpo-deepscaler-1.5b-8K.*); allowed but document intended hardware in the script

Files:

  • examples/configs/recipes/llm/performance/grpo-llama3.3-70b-instruct-4n8g.yaml
  • examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-1n8g.yaml
  • examples/configs/recipes/llm/performance/grpo-qwen3-32b-4n8g-fp8.yaml
  • examples/configs/recipes/llm/performance/grpo-qwen3-32b-4n8g-fp8-16k.yaml
  • examples/configs/recipes/llm/performance/grpo-qwen3-32b-4n8g.yaml
  • examples/configs/recipes/llm/performance/grpo-qwen3-235b-32n8g.yaml
  • examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g.yaml
  • examples/configs/recipes/llm/performance/grpo-qwen3-235b-16n8g.yaml
  • examples/configs/recipes/llm/performance/grpo-qwen3-30ba3b-4n8g.yaml
  • examples/configs/recipes/llm/performance/grpo-deepseek-v3-32n8g.yaml
  • examples/configs/recipes/llm/performance/grpo-llama3.3-70b-instruct-4n8g-16k.yaml
examples/configs/recipes/**

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Place recipe YAMLs under examples/configs/recipes//

Files:

  • examples/configs/recipes/llm/performance/grpo-llama3.3-70b-instruct-4n8g.yaml
  • examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-1n8g.yaml
  • examples/configs/recipes/llm/performance/grpo-qwen3-32b-4n8g-fp8.yaml
  • examples/configs/recipes/llm/performance/grpo-qwen3-32b-4n8g-fp8-16k.yaml
  • examples/configs/recipes/llm/performance/grpo-qwen3-32b-4n8g.yaml
  • examples/configs/recipes/llm/performance/grpo-qwen3-235b-32n8g.yaml
  • examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g.yaml
  • examples/configs/recipes/llm/performance/grpo-qwen3-235b-16n8g.yaml
  • examples/configs/recipes/llm/performance/grpo-qwen3-30ba3b-4n8g.yaml
  • examples/configs/recipes/llm/performance/grpo-deepseek-v3-32n8g.yaml
  • examples/configs/recipes/llm/performance/grpo-llama3.3-70b-instruct-4n8g-16k.yaml
**/*.sh

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.sh: Follow the Google Shell Style Guide for all shell scripts
Use uv run to execute Python scripts in shell/driver scripts instead of activating virtualenvs and calling python directly
Add the NVIDIA copyright header (with current year) at the top of all shell scripts, excluding tests/ and test-only scripts

Files:

  • tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g-fp8.sh
  • tests/test_suites/llm/performance/grpo-deepseek-v3-32n8g.sh
  • tests/test_suites/llm/performance/grpo-qwen3-235b-32n8g.sh
  • tests/test_suites/llm/performance/grpo-qwen3-235b-16n8g.sh
  • tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g.sh
  • tests/test_suites/llm/performance/grpo-llama3.3-70b-instruct-4n8g-16k.sh
  • tests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n8g.sh
  • tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-1n8g.sh
  • tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g.sh
  • tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g-fp8-16k.sh
  • tests/test_suites/llm/performance/grpo-llama3.3-70b-instruct-4n8g.sh
tests/test_suites/**

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Place driver shell scripts and common.env under tests/test_suites// and list nightly tests in tests/test_suites/nightly.txt

Files:

  • tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g-fp8.sh
  • tests/test_suites/llm/performance/grpo-deepseek-v3-32n8g.sh
  • tests/test_suites/llm/performance/grpo-qwen3-235b-32n8g.sh
  • tests/test_suites/llm/performance/grpo-qwen3-235b-16n8g.sh
  • tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g.sh
  • tests/test_suites/llm/performance/grpo-llama3.3-70b-instruct-4n8g-16k.sh
  • tests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n8g.sh
  • tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-1n8g.sh
  • tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g.sh
  • tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g-fp8-16k.sh
  • tests/test_suites/llm/performance/grpo-llama3.3-70b-instruct-4n8g.sh
  • tests/test_suites/llm/performance/common.env
  • tests/test_suites/performance.txt
🧠 Learnings (1)
📚 Learning: 2025-10-12T14:46:57.171Z
Learnt from: zpqiu
PR: NVIDIA-NeMo/RL#1324
File: tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh:6-11
Timestamp: 2025-10-12T14:46:57.171Z
Learning: Test scripts in tests/test_suites/llm/ follow a standard configuration pattern that includes NUM_NODES, STEPS_PER_RUN, MAX_STEPS, NUM_RUNS (calculated as `$(( (MAX_STEPS + STEPS_PER_RUN - 1) / STEPS_PER_RUN ))`), and NUM_MINUTES. These variables are part of the test infrastructure's standard interface and should not be flagged as unused even if not directly referenced within the individual script, as they are consumed by external launch tooling or common.env.

Applied to files:

  • tests/test_suites/llm/performance/common.env
🧬 Code graph analysis (8)
tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g-fp8.sh (1)
tests/test_suites/llm/performance/common.env (1)
  • exit_if_max_steps_reached (12-20)
tests/test_suites/llm/performance/grpo-deepseek-v3-32n8g.sh (1)
tests/test_suites/llm/performance/common.env (1)
  • exit_if_max_steps_reached (12-20)
tests/test_suites/llm/performance/grpo-qwen3-235b-16n8g.sh (1)
tests/test_suites/llm/performance/common.env (1)
  • exit_if_max_steps_reached (12-20)
tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g.sh (1)
tests/test_suites/llm/performance/common.env (1)
  • exit_if_max_steps_reached (12-20)
tests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n8g.sh (1)
tests/test_suites/llm/performance/common.env (1)
  • exit_if_max_steps_reached (12-20)
tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-1n8g.sh (1)
tests/test_suites/llm/performance/common.env (1)
  • exit_if_max_steps_reached (12-20)
tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g.sh (1)
tests/test_suites/llm/performance/common.env (1)
  • exit_if_max_steps_reached (12-20)
tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g-fp8-16k.sh (1)
tests/test_suites/llm/performance/common.env (1)
  • exit_if_max_steps_reached (12-20)
🪛 Shellcheck (0.11.0)
tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g-fp8.sh

[warning] 6-6: NUM_NODES appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 9-9: NUM_RUNS appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 16-16: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)


[error] 28-28: Double quote array expansions to avoid re-splitting elements.

(SC2068)

tests/test_suites/llm/performance/grpo-deepseek-v3-32n8g.sh

[warning] 10-10: NUM_NODES appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 13-13: NUM_RUNS appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 14-14: NUM_MINUTES appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 20-20: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)


[error] 34-34: Double quote array expansions to avoid re-splitting elements.

(SC2068)

tests/test_suites/llm/performance/grpo-qwen3-235b-32n8g.sh

[warning] 8-8: NUM_NODES appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 11-11: NUM_RUNS appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 12-12: NUM_MINUTES appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 18-18: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)


[error] 29-29: Double quote array expansions to avoid re-splitting elements.

(SC2068)

tests/test_suites/llm/performance/grpo-qwen3-235b-16n8g.sh

[warning] 8-8: NUM_NODES appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 11-11: NUM_RUNS appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 12-12: NUM_MINUTES appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 18-18: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)


[error] 29-29: Double quote array expansions to avoid re-splitting elements.

(SC2068)

tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g.sh

[warning] 6-6: NUM_NODES appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 9-9: NUM_RUNS appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 16-16: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)


[error] 28-28: Double quote array expansions to avoid re-splitting elements.

(SC2068)

tests/test_suites/llm/performance/grpo-llama3.3-70b-instruct-4n8g-16k.sh

[warning] 6-6: NUM_NODES appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 9-9: NUM_RUNS appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 16-16: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)


[error] 28-28: Double quote array expansions to avoid re-splitting elements.

(SC2068)

tests/test_suites/llm/performance/grpo-qwen3-30ba3b-4n8g.sh

[warning] 6-6: NUM_NODES appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 9-9: NUM_RUNS appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 16-16: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)


[error] 28-28: Double quote array expansions to avoid re-splitting elements.

(SC2068)

tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-1n8g.sh

[warning] 6-6: NUM_NODES appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 9-9: NUM_RUNS appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 16-16: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)


[error] 28-28: Double quote array expansions to avoid re-splitting elements.

(SC2068)

tests/test_suites/llm/performance/grpo-llama3.1-8b-instruct-2n8g.sh

[warning] 6-6: NUM_NODES appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 9-9: NUM_RUNS appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 16-16: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)


[error] 28-28: Double quote array expansions to avoid re-splitting elements.

(SC2068)

tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g-fp8-16k.sh

[warning] 6-6: NUM_NODES appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 9-9: NUM_RUNS appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 16-16: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)


[error] 28-28: Double quote array expansions to avoid re-splitting elements.

(SC2068)

tests/test_suites/llm/performance/grpo-llama3.3-70b-instruct-4n8g.sh

[warning] 6-6: NUM_NODES appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 9-9: NUM_RUNS appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 10-10: NUM_MINUTES appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 16-16: Use 'cd ... || exit' or 'cd ... || return' in case cd fails.

(SC2164)


[error] 28-28: Double quote array expansions to avoid re-splitting elements.

(SC2068)

🔇 Additional comments (8)
examples/configs/recipes/llm/performance/grpo-deepseek-v3-32n8g.yaml (1)

1-59: LGTM. The YAML configuration is well-structured and complete for a GRPO DeepSeek-V3 performance run.

examples/configs/recipes/llm/performance/grpo-qwen3-32b-4n8g-fp8-16k.yaml (1)

1-53: LGTM. The YAML configuration properly sets up FP8 quantization with consistent precision settings across policy and generation blocks.

examples/configs/recipes/llm/performance/grpo-llama3.3-70b-instruct-4n8g.yaml (1)

1-41: LGTM. The YAML configuration is properly structured for Llama-3.3-70B with appropriate Megatron parallelism settings.

examples/configs/recipes/llm/performance/grpo-qwen3-30ba3b-4n8g.yaml (1)

1-44: LGTM. The YAML configuration correctly sets up MoE-specific parallelism with appropriate expert parallelism for Qwen3-30B-A3B.

tests/test_suites/performance.txt (1)

1-12: LGTM. Test registry is correctly populated with GRPO performance test script references.

tests/test_suites/llm/performance/common.env (1)

1-45: LGTM!

The common environment setup provides robust error handling with set -eou pipefail, validates the config path existence, supports dry-run testing, and implements an early exit mechanism to save compute resources when max steps are reached.

examples/configs/recipes/llm/performance/grpo-qwen3-32b-4n8g-fp8.yaml (1)

1-52: LGTM!

The FP8 configuration for Qwen3-32B is well-structured with appropriate parallelism settings (TP=4, PP=4), blockwise FP8 recipe, and matching vLLM generation configuration.

tests/test_suites/llm/performance/grpo-qwen3-235b-32n8g.sh (1)

8-12: Verify exported or used variables.

Shellcheck reports NUM_NODES, NUM_RUNS, and NUM_MINUTES as unused. These variables are likely either:

  1. Exported for use by exit_if_max_steps_reached from common.env
  2. Used by downstream scripts or the GRPO runner

If truly unused, they should be removed; if intentionally exported, add a comment for clarity.

Can you confirm whether these variables are used by exit_if_max_steps_reached or other external code? If so, consider adding a brief comment above the CONFIG block to document their purpose.

Comment thread examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-1n8g.yaml Outdated
Comment thread examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g.yaml Outdated
Comment thread examples/configs/recipes/llm/performance/grpo-llama3.1-8b-instruct-2n8g.yaml Outdated
Comment thread examples/configs/recipes/llm/performance/grpo-qwen3-235b-32n8g.yaml Outdated
Comment thread examples/configs/recipes/llm/performance/grpo-qwen3-235b-32n8g.yaml Outdated
Comment thread tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g-fp8-16k.sh
Comment thread tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g-fp8.sh
Comment thread tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g-fp8.sh
Comment thread tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g.sh
Comment thread tests/test_suites/llm/performance/grpo-qwen3-32b-4n8g.sh
Comment thread examples/configs/recipes/llm/performance/grpo-deepseek-v3-32n8g.yaml Outdated
guyueh1 and others added 5 commits October 27, 2025 11:25
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
Signed-off-by: Zhiyu Li <zhiyul@nvidia.com>
Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Anna Shors <ashors@nvidia.com>
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Parth Chadha <pchadha@nvidia.com>
Co-authored-by: Zhiyu Li <zhiyul@NVIDIA.com>
Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>
Co-authored-by: Anna Shors <ashors@nvidia.com>
Co-authored-by: Yuki Huang <yukih@nvidia.com>
Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com>
Co-authored-by: Guyue Huang <guyueh@nvidia.com>
Co-authored-by: Parth Chadha <pchadha@nvidia.com>
Co-authored-by: Terry Kong <terryk@nvidia.com>
@guyueh1 guyueh1 requested review from a team as code owners November 3, 2025 22:17
@github-actions github-actions Bot added the Documentation Improvements or additions to documentation label Nov 3, 2025
@guyueh1 guyueh1 added CI:L0 Run doctests and unit tests and removed CI:L0 Run doctests and unit tests labels Nov 8, 2025
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
@guyueh1 guyueh1 added CI:L0 Run doctests and unit tests and removed CI:L0 Run doctests and unit tests labels Nov 9, 2025
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
@guyueh1 guyueh1 added CI:L0 Run doctests and unit tests and removed CI:L0 Run doctests and unit tests labels Nov 9, 2025
@guyueh1
Copy link
Copy Markdown
Contributor Author

guyueh1 commented Nov 10, 2025

@terrykong L0 CI is passing, should we run more CI?

@terrykong terrykong enabled auto-merge (squash) November 10, 2025 20:17
@terrykong terrykong merged commit 3350ba2 into main Nov 10, 2025
41 of 42 checks passed
@terrykong terrykong deleted the guyueh/perf_tests branch November 10, 2025 20:18
chtruong814 pushed a commit that referenced this pull request Nov 10, 2025
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
Signed-off-by: Zhiyu Li <zhiyul@nvidia.com>
Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Anna Shors <ashors@nvidia.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Parth Chadha <pchadha@nvidia.com>
Signed-off-by: Guyue Huang <140554423+guyueh1@users.noreply.github.com>
Co-authored-by: Youngeun Kwon <youngeunk@nvidia.com>
Co-authored-by: Zhiyu Li <zhiyul@NVIDIA.com>
Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>
Co-authored-by: Anna Shors <ashors@nvidia.com>
Co-authored-by: Yuki Huang <yukih@nvidia.com>
Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com>
Co-authored-by: Parth Chadha <pchadha@nvidia.com>
Co-authored-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
zpqiu pushed a commit to sharonyu-115/RL that referenced this pull request Nov 17, 2025
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
Signed-off-by: Zhiyu Li <zhiyul@nvidia.com>
Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Anna Shors <ashors@nvidia.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Parth Chadha <pchadha@nvidia.com>
Signed-off-by: Guyue Huang <140554423+guyueh1@users.noreply.github.com>
Co-authored-by: Youngeun Kwon <youngeunk@nvidia.com>
Co-authored-by: Zhiyu Li <zhiyul@NVIDIA.com>
Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>
Co-authored-by: Anna Shors <ashors@nvidia.com>
Co-authored-by: Yuki Huang <yukih@nvidia.com>
Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com>
Co-authored-by: Parth Chadha <pchadha@nvidia.com>
Co-authored-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Zhaopeng Qiu <alexq@nvidia.com>
PrinsYin pushed a commit to PrinsYin/RL that referenced this pull request Nov 30, 2025
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
Signed-off-by: Zhiyu Li <zhiyul@nvidia.com>
Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Anna Shors <ashors@nvidia.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Parth Chadha <pchadha@nvidia.com>
Signed-off-by: Guyue Huang <140554423+guyueh1@users.noreply.github.com>
Co-authored-by: Youngeun Kwon <youngeunk@nvidia.com>
Co-authored-by: Zhiyu Li <zhiyul@NVIDIA.com>
Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>
Co-authored-by: Anna Shors <ashors@nvidia.com>
Co-authored-by: Yuki Huang <yukih@nvidia.com>
Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com>
Co-authored-by: Parth Chadha <pchadha@nvidia.com>
Co-authored-by: Terry Kong <terryk@nvidia.com>
DeL-TaiseiOzaki pushed a commit to DeL-TaiseiOzaki/RL that referenced this pull request Jan 8, 2026
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
Signed-off-by: Zhiyu Li <zhiyul@nvidia.com>
Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Anna Shors <ashors@nvidia.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Parth Chadha <pchadha@nvidia.com>
Signed-off-by: Guyue Huang <140554423+guyueh1@users.noreply.github.com>
Co-authored-by: Youngeun Kwon <youngeunk@nvidia.com>
Co-authored-by: Zhiyu Li <zhiyul@NVIDIA.com>
Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>
Co-authored-by: Anna Shors <ashors@nvidia.com>
Co-authored-by: Yuki Huang <yukih@nvidia.com>
Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com>
Co-authored-by: Parth Chadha <pchadha@nvidia.com>
Co-authored-by: Terry Kong <terryk@nvidia.com>
yuanhangsu1986 pushed a commit to yuanhangsu1986/RL-Nemontron-Edge-Omni that referenced this pull request Feb 21, 2026
Signed-off-by: Guyue Huang <guyueh@nvidia.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
Signed-off-by: Zhiyu Li <zhiyul@nvidia.com>
Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Anna Shors <ashors@nvidia.com>
Signed-off-by: Yi-Fu Wu <yifu.wu@gmail.com>
Signed-off-by: Parth Chadha <pchadha@nvidia.com>
Signed-off-by: Guyue Huang <140554423+guyueh1@users.noreply.github.com>
Co-authored-by: Youngeun Kwon <youngeunk@nvidia.com>
Co-authored-by: Zhiyu Li <zhiyul@NVIDIA.com>
Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>
Co-authored-by: Anna Shors <ashors@nvidia.com>
Co-authored-by: Yuki Huang <yukih@nvidia.com>
Co-authored-by: Yi-Fu Wu <yifu.wu@gmail.com>
Co-authored-by: Parth Chadha <pchadha@nvidia.com>
Co-authored-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L0 Run doctests and unit tests r0.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants