docs: Document Gym + RL integration design by ananthsub · Pull Request #1762 · NVIDIA-NeMo/RL

ananthsub · 2026-01-12T12:57:02Z

What does this PR do ?

Part of NVIDIA-NeMo/Gym#292

This PR documents the NeMo RL + Gym integration, which includes:

The Ray actor bridge code in RL that initializes & launches Gym, and how Gym re-uses the Ray cluster info
How RL prepares its vLLM servers for Gym to proxy through to, so inference logic is contained within RL
The training loop flow for how RL sends request data to Gym and how the data is translated between Gym and RL formats

Issues

NVIDIA-NeMo/Gym#292

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

Summary by CodeRabbit

Documentation
- Added comprehensive design documentation for NeMo Gym integration covering the complete system architecture, actor components, HTTP proxy configuration, initialization procedures, full training loop control flow, data translation specifications, tokenization requirements, API endpoint definitions, and integration workflows. Includes detailed visual architectural diagrams throughout.

coderabbitai · 2026-02-02T21:17:25Z

📝 Walkthrough

Walkthrough

Two documentation files were added: a new design document describing the NeMo Gym integration architecture, initialization sequence, training loop, data formats, and tokenization with Mermaid diagrams; and an update to the documentation index to include the new design document in the navigation structure.

Changes

Cohort / File(s)	Summary
Design Documentation `docs/design-docs/nemo-gym-integration.md`, `docs/index.md`	New design documentation file for NeMo Gym integration describing architecture (NemoGym Actor, vLLM HTTP proxy, rollouts flow), initialization sequence, training loop control flow, data format translation, and tokenization with visual diagrams. Documentation index updated to include the new design document.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately and concisely summarizes the main change: adding documentation for the Gym and RL integration design.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Test Results For Major Changes	✅ Passed	PR contains only documentation changes with no code modifications, new features, or impact on numerics/performance, qualifying as a minor change.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@docs/design-docs/nemo-gym-integration.md`:
- Around line 22-23: Update the inline comments for the two config keys to
explicitly state their relationship: note that async_engine and
expose_http_server are independent settings but both must be enabled to support
the HTTP server; e.g., change the comment on async_engine to clarify it enables
the async worker/runtime and the comment on expose_http_server to state it
controls whether the HTTP server (exposing /v1/chat/completions) is started, and
add a combined comment line that both must be true to enable HTTP server
support.

🧹 Nitpick comments (1)

docs/design-docs/nemo-gym-integration.md (1)
184-184: Minor grammar refinement.

For consistency with the formal tone used throughout the documentation, consider revising "Results return out of order" to "Results are returned out of order".
📝 Suggested revision
-1. **Results return out of order**: Rollouts complete at different times depending on conversation length and tool calls. Rather than waiting for all results, the actor processes each result as soon as it completes.
+1. **Results are returned out of order**: Rollouts complete at different times depending on conversation length and tool calls. Rather than waiting for all results, the actor processes each result as soon as it completes.

terrykong

thanks for writing this doc @ananthsub !

terrykong

lgtm

+@jgerh for tech edit

jgerh

Completed tech pubs review. No comments. LGTM.

Signed-off-by: Ananth Subramaniam <ansubramania@nvidia.com>

Signed-off-by: Ananth Subramaniam <ansubramania@nvidia.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>

Signed-off-by: Ananth Subramaniam <ansubramania@nvidia.com>

ananthsub requested a review from bxyu-nvidia January 12, 2026 12:57

ananthsub added the Documentation Improvements or additions to documentation label Jan 12, 2026

ananthsub temporarily deployed to nemo-ci January 12, 2026 12:57 — with GitHub Actions Inactive

ananthsub temporarily deployed to nemo-ci January 12, 2026 13:00 — with GitHub Actions Inactive

ananthsub force-pushed the docs-rl-gym-diagram branch from 9527953 to 9ea9f1a Compare February 2, 2026 17:39

ananthsub temporarily deployed to nemo-ci February 2, 2026 17:39 — with GitHub Actions Inactive

ananthsub changed the title ~~[docs] Document Gym + RL integration design~~ docs: Document Gym + RL integration design Feb 2, 2026

ananthsub temporarily deployed to nemo-ci February 2, 2026 18:06 — with GitHub Actions Inactive

ananthsub marked this pull request as ready for review February 2, 2026 21:10

ananthsub requested a review from a team as a code owner February 2, 2026 21:10

coderabbitai Bot reviewed Feb 2, 2026

View reviewed changes

Comment thread docs/design-docs/nemo-gym-integration.md Outdated

terrykong reviewed Feb 3, 2026

View reviewed changes

Comment thread docs/design-docs/nemo-gym-integration.md Outdated

Comment thread docs/design-docs/nemo-gym-integration.md Outdated

Comment thread docs/design-docs/nemo-gym-integration.md Outdated

Comment thread docs/design-docs/nemo-gym-integration.md Outdated

terrykong had a problem deploying to nemo-ci February 3, 2026 07:41 — with GitHub Actions Error

terrykong reviewed Feb 3, 2026

View reviewed changes

Comment thread docs/design-docs/nemo-gym-integration.md Outdated

Comment thread docs/design-docs/nemo-gym-integration.md Outdated

terrykong temporarily deployed to nemo-ci February 3, 2026 07:45 — with GitHub Actions Inactive

terrykong reviewed Feb 3, 2026

View reviewed changes

Comment thread docs/design-docs/nemo-gym-integration.md

terrykong reviewed Feb 3, 2026

View reviewed changes

Comment thread docs/design-docs/nemo-gym-integration.md Outdated

Comment thread docs/design-docs/nemo-gym-integration.md

Comment thread docs/design-docs/nemo-gym-integration.md Outdated

Comment thread docs/design-docs/nemo-gym-integration.md

terrykong temporarily deployed to nemo-ci February 3, 2026 08:42 — with GitHub Actions Inactive

ananthsub force-pushed the docs-rl-gym-diagram branch from d2deb5e to 3035b26 Compare February 4, 2026 23:53

ananthsub temporarily deployed to nemo-ci February 4, 2026 23:53 — with GitHub Actions Inactive

ananthsub requested a review from terrykong February 4, 2026 23:54

terrykong approved these changes Feb 5, 2026

View reviewed changes

terrykong requested a review from jgerh February 5, 2026 00:33

ananthsub temporarily deployed to nemo-ci February 5, 2026 07:04 — with GitHub Actions Inactive

jgerh reviewed Feb 5, 2026

View reviewed changes

[docs] Add gym + rl design integration

dc3be68

Signed-off-by: Ananth Subramaniam <ansubramania@nvidia.com>

address feedback

04735bb

Signed-off-by: Ananth Subramaniam <ansubramania@nvidia.com>

ananthsub force-pushed the docs-rl-gym-diagram branch from 3035b26 to 04735bb Compare February 12, 2026 18:22

ananthsub temporarily deployed to nemo-ci February 12, 2026 18:23 — with GitHub Actions Inactive

ananthsub temporarily deployed to nemo-ci February 12, 2026 20:08 — with GitHub Actions Inactive

terrykong merged commit 869b5e5 into NVIDIA-NeMo:main Feb 13, 2026
27 checks passed

ananthsub deleted the docs-rl-gym-diagram branch February 13, 2026 20:10

coderabbitai Bot mentioned this pull request Feb 17, 2026

docs: fern migration #1975

Closed

seonjinn pushed a commit that referenced this pull request Mar 8, 2026

docs: Document Gym + RL integration design (#1762)

d2a02e5

Signed-off-by: Ananth Subramaniam <ansubramania@nvidia.com>

seonjinn pushed a commit that referenced this pull request Mar 8, 2026

docs: Document Gym + RL integration design (#1762)

37da20d

Signed-off-by: Ananth Subramaniam <ansubramania@nvidia.com>

seonjinn pushed a commit that referenced this pull request Mar 9, 2026

docs: Document Gym + RL integration design (#1762)

1d83aa7

Signed-off-by: Ananth Subramaniam <ansubramania@nvidia.com>

Conversation

ananthsub commented Jan 12, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Feb 2, 2026

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

terrykong left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

terrykong left a comment

Choose a reason for hiding this comment

Uh oh!

jgerh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ananthsub commented Jan 12, 2026 •

edited by coderabbitai Bot

Loading