fix: grad norm calculation for dtensor v2 by hemildesai · Pull Request #1693 · NVIDIA-NeMo/RL

hemildesai · 2025-12-23T22:08:49Z

Signed-off-by: Hemil Desai <hemild@nvidia.com>

github-actions · 2025-12-23T22:09:13Z

⚠️ File Consistency Check

Check based on commit: 3cef1ed (PR #1693 from hemil/fix-grad-norm-dtensor-v2)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/workers/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/workers/dtensor_policy_worker.py
Update nemo_rl/models/policy/workers/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/workers/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

coderabbitai · 2025-12-23T22:13:33Z

📝 Walkthrough

Walkthrough

Loss scaling in the policy worker training is adjusted to be applied immediately before backpropagation to cancel FSDP averaging across DP and CP dimensions, and post-run test metrics now include gradient norm bounds validation at step 30.

Changes

Cohort / File(s)	Summary
Training loss scaling `nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py`	Loss is now explicitly scaled by `self.dp_size * self.cp_size` before `backward()` to counteract FSDP's averaging across distributed dimensions; surrounding comments adjusted to reflect new scaling location.
Test validation `tests/test_suites/llm/grpo-qwen2.5-7b-instruct-4n8g-fsdp2tp4.v3.sh`	Post-run metrics validation expanded to include gradient norm bounds at step 30: `0.1 < train/grad_norm < 0.5`, in addition to existing token error check.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

fix: Fix gradient clipping of non-float32 params #1158: Modifies gradient handling in the same training code path (dtensor_policy_worker_v2.train), addressing gradient clipping behavior alongside these loss scaling changes.

Suggested labels

r0.4.0

Suggested reviewers

yfw
joyang-nv
parthchadha

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Test Results For Major Changes	⚠️ Warning	PR lacks test results, baseline comparisons, or convergence metrics for critical gradient computation changes affecting distributed training numerics.	Add before-and-after convergence metrics, gradient norm values, loss trajectories, and CI/CD results validating the fix without training regressions.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'fix: grad norm calculation for dtensor v2' directly and clearly describes the main change: fixing gradient norm calculation for dtensor v2, which aligns with the code modifications in dtensor_policy_worker_v2.py and test validation updates.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch hemil/fix-grad-norm-dtensor-v2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 669e70c and 3cef1ed.

📒 Files selected for processing (2)

nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
tests/test_suites/llm/grpo-qwen2.5-7b-instruct-4n8g-fsdp2tp4.v3.sh

🧰 Additional context used

📓 Path-based instructions (6)

**/*.sh