fix: grpo-llama3.1-8b-instruct-1n8g-megatron-fp8-rollouts runs 40 steps by terrykong · Pull Request #1231 · NVIDIA-NeMo/RL

terrykong · 2025-09-29T22:24:12Z

Another regression identified by https://github.com/NVIDIA-NeMo/RL/pull/1223/files

Basically the original commit that added this test does not complete 100 steps in 120min with 1 node. This was likely due to the original step count using 4 nodes instead of 1.

https://wandb.ai/nvidia/nemo-rl?nw=ujibat1dqme

This change allows the test to complete and changes the version of the test since it's not comparable with previous

Summary by CodeRabbit

Chores
- Updated example recipe to use v2 artifact names, aligning log directory and run naming for consistency.
Tests
- Reduced step counts from 100 to 40 in the related test suite to shorten execution time and speed up validation.

Signed-off-by: Terry Kong <terryk@nvidia.com>

coderabbitai · 2025-09-29T22:27:41Z

📝 Walkthrough

Walkthrough

Updated a GRPO LLaMA3.1 v2 recipe to use v2 log and W&B names, and adjusted a related test script to reduce step counts from 100 to 40. No other logic or control flow changes.

Changes

Cohort / File(s)	Summary
Recipe config v2 renaming `examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-1n8g-megatron-fp8-rollouts.v2.yaml`	Update `log_dir` and `wandb.name` strings to `.v2` suffixed values.
Test steps reduction `tests/test_suites/llm/grpo-llama3.1-8b-instruct-1n8g-megatron-fp8-rollouts.v2.sh`	Decrease `STEPS_PER_RUN` and `MAX_STEPS` from 100 to 40; no control-flow changes.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested labels

CI:L0

Suggested reviewers

chtruong814
ashors1

Pre-merge checks and finishing touches

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title clearly indicates the primary change of limiting the grpo-llama3.1-8b-instruct-1n8g-megatron-fp8-rollouts test to 40 steps, which matches the PR’s intent to fix the step-count regression; it is concise, focused, and directly reflects the main change.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.
Test Results For Major Changes	✅ Passed	This PR contains a test configuration adjustment rather than major functional changes. The changes reduce test step counts from 100 to 40 to reflect the actual capability of 1-node execution, addressing a timeout issue identified in PR 1223. The PR description adequately documents test results including empirical observations (40 steps completed in practice), a screenshot reference, and a link to Weights & Biases run data demonstrating the test behavior. Since this is a test infrastructure fix rather than a major feature or algorithm change, and testing information is appropriately documented, the check passes.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch tk/fp8-fix-nightly

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 17ea9ab and d14fb60.

📒 Files selected for processing (2)

examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-1n8g-megatron-fp8-rollouts.v2.yaml (1 hunks)
tests/test_suites/llm/grpo-llama3.1-8b-instruct-1n8g-megatron-fp8-rollouts.v2.sh (1 hunks)

🧰 Additional context used

📓 Path-based instructions (7)

examples/configs/recipes/**/*.yaml