docs: Document Gym + RL integration design#1762
Conversation
9527953 to
9ea9f1a
Compare
📝 WalkthroughWalkthroughTwo documentation files were added: a new design document describing the NeMo Gym integration architecture, initialization sequence, training loop, data formats, and tokenization with Mermaid diagrams; and an update to the documentation index to include the new design document in the navigation structure. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~5 minutes 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@docs/design-docs/nemo-gym-integration.md`:
- Around line 22-23: Update the inline comments for the two config keys to
explicitly state their relationship: note that async_engine and
expose_http_server are independent settings but both must be enabled to support
the HTTP server; e.g., change the comment on async_engine to clarify it enables
the async worker/runtime and the comment on expose_http_server to state it
controls whether the HTTP server (exposing /v1/chat/completions) is started, and
add a combined comment line that both must be true to enable HTTP server
support.
🧹 Nitpick comments (1)
docs/design-docs/nemo-gym-integration.md (1)
184-184: Minor grammar refinement.For consistency with the formal tone used throughout the documentation, consider revising "Results return out of order" to "Results are returned out of order".
📝 Suggested revision
-1. **Results return out of order**: Rollouts complete at different times depending on conversation length and tool calls. Rather than waiting for all results, the actor processes each result as soon as it completes. +1. **Results are returned out of order**: Rollouts complete at different times depending on conversation length and tool calls. Rather than waiting for all results, the actor processes each result as soon as it completes.
terrykong
left a comment
There was a problem hiding this comment.
thanks for writing this doc @ananthsub !
d2deb5e to
3035b26
Compare
jgerh
left a comment
There was a problem hiding this comment.
Completed tech pubs review. No comments. LGTM.
Signed-off-by: Ananth Subramaniam <ansubramania@nvidia.com>
Signed-off-by: Ananth Subramaniam <ansubramania@nvidia.com>
3035b26 to
04735bb
Compare
Signed-off-by: Ananth Subramaniam <ansubramania@nvidia.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>
Signed-off-by: Ananth Subramaniam <ansubramania@nvidia.com>
Signed-off-by: Ananth Subramaniam <ansubramania@nvidia.com>
Signed-off-by: Ananth Subramaniam <ansubramania@nvidia.com>
What does this PR do ?
Part of NVIDIA-NeMo/Gym#292
This PR documents the NeMo RL + Gym integration, which includes:
Issues
NVIDIA-NeMo/Gym#292
Usage
# Add a code snippet demonstrating how to use thisBefore your PR is "Ready for review"
Pre checks:
Additional Information
Summary by CodeRabbit