Skip to content

fix: grpo-llama3.1-8b-instruct-1n8g-megatron-fp8-rollouts runs 40 steps#1231

Merged
terrykong merged 3 commits intomainfrom
tk/fp8-fix-nightly
Sep 30, 2025
Merged

fix: grpo-llama3.1-8b-instruct-1n8g-megatron-fp8-rollouts runs 40 steps#1231
terrykong merged 3 commits intomainfrom
tk/fp8-fix-nightly

Conversation

@terrykong
Copy link
Copy Markdown
Collaborator

@terrykong terrykong commented Sep 29, 2025

Another regression identified by https://github.com/NVIDIA-NeMo/RL/pull/1223/files

Basically the original commit that added this test does not complete 100 steps in 120min with 1 node. This was likely due to the original step count using 4 nodes instead of 1.

image

https://wandb.ai/nvidia/nemo-rl?nw=ujibat1dqme

This change allows the test to complete and changes the version of the test since it's not comparable with previous

Summary by CodeRabbit

  • Chores

    • Updated example recipe to use v2 artifact names, aligning log directory and run naming for consistency.
  • Tests

    • Reduced step counts from 100 to 40 in the related test suite to shorten execution time and speed up validation.

Signed-off-by: Terry Kong <terryk@nvidia.com>
@terrykong terrykong requested a review from guyueh1 September 29, 2025 22:24
@terrykong terrykong requested review from a team as code owners September 29, 2025 22:24
@terrykong terrykong added CI:L0 Run doctests and unit tests r0.4.0 labels Sep 29, 2025
@terrykong terrykong enabled auto-merge (squash) September 29, 2025 22:24
parthchadha
parthchadha previously approved these changes Sep 29, 2025
@terrykong terrykong added CI:L0 Run doctests and unit tests and removed CI:L0 Run doctests and unit tests labels Sep 29, 2025
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Sep 29, 2025

📝 Walkthrough

Walkthrough

Updated a GRPO LLaMA3.1 v2 recipe to use v2 log and W&B names, and adjusted a related test script to reduce step counts from 100 to 40. No other logic or control flow changes.

Changes

Cohort / File(s) Summary
Recipe config v2 renaming
examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-1n8g-megatron-fp8-rollouts.v2.yaml
Update log_dir and wandb.name strings to .v2 suffixed values.
Test steps reduction
tests/test_suites/llm/grpo-llama3.1-8b-instruct-1n8g-megatron-fp8-rollouts.v2.sh
Decrease STEPS_PER_RUN and MAX_STEPS from 100 to 40; no control-flow changes.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested labels

CI:L0

Suggested reviewers

  • chtruong814
  • ashors1

Pre-merge checks and finishing touches

✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title clearly indicates the primary change of limiting the grpo-llama3.1-8b-instruct-1n8g-megatron-fp8-rollouts test to 40 steps, which matches the PR’s intent to fix the step-count regression; it is concise, focused, and directly reflects the main change.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
Test Results For Major Changes ✅ Passed This PR contains a test configuration adjustment rather than major functional changes. The changes reduce test step counts from 100 to 40 to reflect the actual capability of 1-node execution, addressing a timeout issue identified in PR 1223. The PR description adequately documents test results including empirical observations (40 steps completed in practice), a screenshot reference, and a link to Weights & Biases run data demonstrating the test behavior. Since this is a test infrastructure fix rather than a major feature or algorithm change, and testing information is appropriately documented, the check passes.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch tk/fp8-fix-nightly

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 17ea9ab and d14fb60.

📒 Files selected for processing (2)
  • examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-1n8g-megatron-fp8-rollouts.v2.yaml (1 hunks)
  • tests/test_suites/llm/grpo-llama3.1-8b-instruct-1n8g-megatron-fp8-rollouts.v2.sh (1 hunks)
🧰 Additional context used
📓 Path-based instructions (7)
examples/configs/recipes/**/*.yaml

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

examples/configs/recipes/**/*.yaml: Recipe YAMLs under examples/configs/recipes/** are runnable snapshots and may omit documentation
When adding support for a new model, add a recipe YAML under examples/configs/recipes/ in the appropriate domain (llm/ or vlm/) with the correct name

Files:

  • examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-1n8g-megatron-fp8-rollouts.v2.yaml
examples/configs/recipes/llm/*.yaml

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

LLM recipe YAML filenames must follow: --ng-[-modifiers][-long][.vN].yaml

Files:

  • examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-1n8g-megatron-fp8-rollouts.v2.yaml
examples/configs/recipes/**/*.{yaml,sh}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Known exception: Deepscaler recipes may encode context length in place of the cluster tuple (e.g., grpo-deepscaler-1.5b-8K.*); allowed but document intended hardware in the script

Files:

  • examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-1n8g-megatron-fp8-rollouts.v2.yaml
examples/configs/recipes/**

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Place recipe YAMLs under examples/configs/recipes//

Files:

  • examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-1n8g-megatron-fp8-rollouts.v2.yaml
**/*.sh

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.sh: Follow the Google Shell Style Guide for all shell scripts
Use uv run to execute Python scripts in shell/driver scripts instead of activating virtualenvs and calling python directly
Add the NVIDIA copyright header (with current year) at the top of all shell scripts, excluding tests/ and test-only scripts

Files:

  • tests/test_suites/llm/grpo-llama3.1-8b-instruct-1n8g-megatron-fp8-rollouts.v2.sh
tests/test_suites/llm/*.sh

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

LLM driver script filenames must mirror the YAML base name and follow the same pattern with .sh extension

Files:

  • tests/test_suites/llm/grpo-llama3.1-8b-instruct-1n8g-megatron-fp8-rollouts.v2.sh
tests/test_suites/**

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Place driver shell scripts and common.env under tests/test_suites// and list nightly tests in tests/test_suites/nightly.txt

Files:

  • tests/test_suites/llm/grpo-llama3.1-8b-instruct-1n8g-megatron-fp8-rollouts.v2.sh
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Docs_Tests
  • GitHub Check: Lint check
  • GitHub Check: sphinx-build / Build docs
  • GitHub Check: Lint check
  • GitHub Check: Lint check
  • GitHub Check: Post submodule check comment / Comment on PR
  • GitHub Check: Post automodel integration comment / Comment on PR
🔇 Additional comments (2)
tests/test_suites/llm/grpo-llama3.1-8b-instruct-1n8g-megatron-fp8-rollouts.v2.sh (1)

7-8: LGTM! Step count reduction aligns with PR objective.

The reduction from 100 to 40 steps correctly addresses the 1-node constraint identified in the PR description.

examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-1n8g-megatron-fp8-rollouts.v2.yaml (1)

45-50: LGTM! Version suffix properly applied.

The v2 naming for log directory and W&B experiment name correctly distinguishes this variant from the original recipe, ensuring results won't be mixed with previous runs.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🧪 Early access (Sonnet 4.5): enabled

We are currently testing the Sonnet 4.5 model, which is expected to improve code review quality. However, this model may lead to increased noise levels in the review comments. Please disable the early access features if the noise level causes any inconvenience.

Note:

  • Public repositories are always opted into early access features.
  • You can enable or disable early access features from the CodeRabbit UI or by updating the CodeRabbit configuration file.

Comment @coderabbitai help to get the list of available commands and usage tips.

guyueh1
guyueh1 previously approved these changes Sep 29, 2025
Signed-off-by: Terry Kong <terryk@nvidia.com>
@terrykong terrykong dismissed stale reviews from guyueh1 and parthchadha via b1c5d8b September 29, 2025 22:42
@terrykong terrykong added CI:L0 Run doctests and unit tests and removed CI:L0 Run doctests and unit tests labels Sep 29, 2025
@terrykong terrykong requested a review from guyueh1 September 29, 2025 22:45
@terrykong terrykong merged commit c2b36f2 into main Sep 30, 2025
40 of 41 checks passed
@terrykong terrykong deleted the tk/fp8-fix-nightly branch September 30, 2025 00:08
PrinsYin pushed a commit to PrinsYin/RL that referenced this pull request Nov 30, 2025
yuanhangsu1986 pushed a commit to yuanhangsu1986/RL-Nemontron-Edge-Omni that referenced this pull request Feb 21, 2026
…ps (NVIDIA-NeMo#1231)

Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L0 Run doctests and unit tests r0.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants