fix: grpo-llama3.1-8b-instruct-1n8g-megatron-fp8-rollouts runs 40 steps#1231
fix: grpo-llama3.1-8b-instruct-1n8g-megatron-fp8-rollouts runs 40 steps#1231
Conversation
Signed-off-by: Terry Kong <terryk@nvidia.com>
📝 WalkthroughWalkthroughUpdated a GRPO LLaMA3.1 v2 recipe to use v2 log and W&B names, and adjusted a related test script to reduce step counts from 100 to 40. No other logic or control flow changes. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
🧰 Additional context used📓 Path-based instructions (7)examples/configs/recipes/**/*.yaml📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Files:
examples/configs/recipes/llm/*.yaml📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Files:
examples/configs/recipes/**/*.{yaml,sh}📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Files:
examples/configs/recipes/**📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Files:
**/*.sh📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Files:
tests/test_suites/llm/*.sh📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Files:
tests/test_suites/**📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Files:
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
🔇 Additional comments (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🧪 Early access (Sonnet 4.5): enabledWe are currently testing the Sonnet 4.5 model, which is expected to improve code review quality. However, this model may lead to increased noise levels in the review comments. Please disable the early access features if the noise level causes any inconvenience. Note:
Comment |
Signed-off-by: Terry Kong <terryk@nvidia.com>
…ps (NVIDIA-NeMo#1231) Signed-off-by: Terry Kong <terryk@nvidia.com>
…ps (NVIDIA-NeMo#1231) Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
Another regression identified by https://github.com/NVIDIA-NeMo/RL/pull/1223/files
Basically the original commit that added this test does not complete 100 steps in 120min with 1 node. This was likely due to the original step count using 4 nodes instead of 1.
https://wandb.ai/nvidia/nemo-rl?nw=ujibat1dqme
This change allows the test to complete and changes the version of the test since it's not comparable with previous
Summary by CodeRabbit
Chores
Tests