chore: add assert for tp4 batch variant accuracy issue by yuki-97 · Pull Request #1861 · NVIDIA-NeMo/RL

yuki-97 · 2026-02-02T06:33:58Z

As title.

Summary by CodeRabbit

Refactor
- Optimized loss function initialization workflow by consolidating setup operations.
Bug Fixes
- Added runtime validation safeguard for tensor model parallel configurations to ensure batch size consistency.

coderabbitai · 2026-02-02T06:38:10Z

📝 Walkthrough

Walkthrough

Loss function creation and validation in the GRPO algorithm were moved from the initialization phase to the setup phase for earlier preparation. Additionally, a runtime safeguard was added to the policy model to enforce batch size consistency when using tensor model parallelism with tp_size >= 4.

Changes

Cohort / File(s)	Summary
GRPO Loss Function Setup `nemo_rl/algorithms/grpo.py`	Relocated loss function instantiation and force_on_policy_ratio validation from `initialize_generation_with_policy` to `setup` method. Removes duplicate initialization logic and consolidates loss-function checks earlier in the pipeline.
Policy Model Parallelism Safeguard `nemo_rl/models/policy/lm_policy.py`	Added runtime assertion in `Policy.__init__` to enforce that `train_micro_batch_size` equals `logprob_batch_size` when tensor model parallel size is 4 or greater, unless `NRL_IGNORE_TP_ACCURACY_CHECK` environment variable is set. Includes detailed remediation guidance message.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~15 minutes

Suggested labels

CI:L1

Suggested reviewers

hemildesai
terrykong

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title mentions adding an assert for a tp4 batch variant accuracy issue, which aligns with the primary change in nemo_rl/models/policy/lm_policy.py where a runtime safeguard assertion is added. However, the title only partially reflects the changeset—it doesn't mention the secondary change in grpo.py involving loss function refactoring.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Test Results For Major Changes	✅ Passed	PR contains minor changes: code reorganization in grpo.py and defensive validation in lm_policy.py. No algorithmic modifications, no feature additions, no numerical impact. Test documentation not required for minor changes.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch yukih/assert-diff-bs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@nemo_rl/algorithms/grpo.py`:
- Around line 303-313: Print for the force_on_policy_ratio check in grpo.py
doesn't flush output like other prints in setup(); update the print call that
outputs "  ✓ force_on_policy_ratio enabled" to include flush=True so it behaves
consistently with other setup() output (locate the block that checks
loss_config.get("force_on_policy_ratio"), which sets
os.environ["NRL_IGNORE_TP_ACCURACY_CHECK"] and currently calls print()).

In `@nemo_rl/models/policy/lm_policy.py`:
- Around line 138-151: Replace the runtime assert with an explicit conditional
that always runs: parse os.environ.get("NRL_IGNORE_TP_ACCURACY_CHECK") by
normalizing to lowercase and treating only "1", "true", or "yes" as truthy, then
if the bypass is not set and tp_size >= 4 check if
config["train_micro_batch_size"] != config["logprob_batch_size"] and raise a
RuntimeError (or ValueError) with the same multi-line message; update the block
around tp_size and the config checks in lm_policy.py (refer to tp_size,
config["train_micro_batch_size"], config["logprob_batch_size"], and the
NRL_IGNORE_TP_ACCURACY_CHECK env var) so the validation cannot be skipped by
Python -O or by setting the env var to "0"/"false".

Signed-off-by: Yuki Huang <yukih@nvidia.com>

) Signed-off-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: yuanhangs <yuanhangs@nvidia.com>

Signed-off-by: Yuki Huang <yukih@nvidia.com>

) Signed-off-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: Aniket Singh Yadav <singhyadavaniket43@gmail.com>

yuki-97 requested review from a team as code owners February 2, 2026 06:33

yuki-97 added the CI:L1 Run doctests, unit tests, and functional tests label Feb 2, 2026

yuki-97 temporarily deployed to nemo-ci February 2, 2026 06:34 — with GitHub Actions Inactive

yuki-97 requested review from terrykong and yfw February 2, 2026 06:34

coderabbitai Bot reviewed Feb 2, 2026

View reviewed changes

Comment thread nemo_rl/algorithms/grpo.py

Comment thread nemo_rl/models/policy/lm_policy.py Outdated

yuki-97 temporarily deployed to nemo-ci February 2, 2026 06:56 — with GitHub Actions Inactive

yuki-97 force-pushed the yukih/assert-diff-bs branch from 42d0467 to e10241c Compare February 2, 2026 08:08

yuki-97 requested a review from a team as a code owner February 2, 2026 08:08

github-actions Bot added the Documentation Improvements or additions to documentation label Feb 2, 2026

yuki-97 commented Feb 2, 2026

View reviewed changes

Comment thread docs/guides/dtensor-tp-accuracy.md

yuki-97 added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Feb 2, 2026

yuki-97 temporarily deployed to nemo-ci February 2, 2026 08:12 — with GitHub Actions Inactive

yuki-97 temporarily deployed to nemo-ci February 2, 2026 09:22 — with GitHub Actions Inactive

yuki-97 requested a review from a team as a code owner February 2, 2026 11:33

yuki-97 added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Feb 2, 2026

yuki-97 had a problem deploying to nemo-ci February 2, 2026 11:33 — with GitHub Actions Error

yuki-97 force-pushed the yukih/assert-diff-bs branch from 3eb8376 to 29465d7 Compare February 2, 2026 11:35

yuki-97 added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Feb 2, 2026

yuki-97 temporarily deployed to nemo-ci February 2, 2026 11:36 — with GitHub Actions Inactive

yuki-97 temporarily deployed to nemo-ci February 2, 2026 11:40 — with GitHub Actions Inactive

yuki-97 temporarily deployed to nemo-ci February 2, 2026 14:23 — with GitHub Actions Inactive

terrykong reviewed Feb 2, 2026

View reviewed changes

Comment thread examples/configs/recipes/llm/grpo-gemma3-27b-it-8n8g-fsdp2tp8-actckpt-long.yaml

yuki-97 force-pushed the yukih/assert-diff-bs branch from 29465d7 to ea65045 Compare February 3, 2026 03:12

terrykong approved these changes Feb 3, 2026

View reviewed changes

terrykong enabled auto-merge (squash) February 3, 2026 04:55

terrykong added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Feb 3, 2026

terrykong temporarily deployed to nemo-ci February 3, 2026 04:55 — with GitHub Actions Inactive

terrykong temporarily deployed to nemo-ci February 3, 2026 06:34 — with GitHub Actions Inactive

yuki-97 added 5 commits February 3, 2026 22:10

add assert for tp issue

a3cc8b6

Signed-off-by: Yuki Huang <yukih@nvidia.com>

remove model specific content in doc

048cc46

Signed-off-by: Yuki Huang <yukih@nvidia.com>

fix unit test

341c6df

Signed-off-by: Yuki Huang <yukih@nvidia.com>

update configs to have same bs when tp>=4

6dd4c4e

Signed-off-by: Yuki Huang <yukih@nvidia.com>

add unit test to prevent

cb22ea7

Signed-off-by: Yuki Huang <yukih@nvidia.com>

yuki-97 force-pushed the yukih/assert-diff-bs branch from ea65045 to cb22ea7 Compare February 3, 2026 14:10

yuki-97 added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Feb 3, 2026

yuki-97 temporarily deployed to nemo-ci February 3, 2026 14:11 — with GitHub Actions Inactive

yuki-97 temporarily deployed to nemo-ci February 3, 2026 15:02 — with GitHub Actions Inactive

yuki-97 had a problem deploying to nemo-ci February 3, 2026 20:19 — with GitHub Actions Failure

yuki-97 temporarily deployed to nemo-ci February 4, 2026 02:28 — with GitHub Actions Inactive

terrykong merged commit b559b7f into main Feb 4, 2026
54 of 58 checks passed

terrykong deleted the yukih/assert-diff-bs branch February 4, 2026 04:14

seonjinn pushed a commit that referenced this pull request Mar 8, 2026

chore: add assert for tp4 batch variant accuracy issue (#1861)

4d6f2b3

Signed-off-by: Yuki Huang <yukih@nvidia.com>

seonjinn pushed a commit that referenced this pull request Mar 8, 2026

chore: add assert for tp4 batch variant accuracy issue (#1861)

79c8e22

Signed-off-by: Yuki Huang <yukih@nvidia.com>

seonjinn pushed a commit that referenced this pull request Mar 9, 2026

chore: add assert for tp4 batch variant accuracy issue (#1861)

ed929e1

Signed-off-by: Yuki Huang <yukih@nvidia.com>

Aniketsy pushed a commit to Aniketsy/RL that referenced this pull request Mar 29, 2026

chore: add assert for tp4 batch variant accuracy issue (NVIDIA-NeMo#1861

066d61f

) Signed-off-by: Yuki Huang <yukih@nvidia.com> Signed-off-by: Aniket Singh Yadav <singhyadavaniket43@gmail.com>

yuki-97 mentioned this pull request Apr 2, 2026

fix: revert logprob_batch_size to keep same perf as before #2192

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: add assert for tp4 batch variant accuracy issue#1861

chore: add assert for tp4 batch variant accuracy issue#1861
terrykong merged 5 commits intomainfrom
yukih/assert-diff-bs

yuki-97 commented Feb 2, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Feb 2, 2026

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yuki-97 commented Feb 2, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Feb 2, 2026

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yuki-97 commented Feb 2, 2026 •

edited by coderabbitai Bot

Loading