Gradient scale in `HfPolicyWorker.train`

**Describe the bug**

When gradients are accumulated across micro batches, they should be averaged rather than summed up to eliminate the effect of global batch size. The current implementation results in a gradient scale proportional to `gbs / mbs`. This is not a big issue for Adam, but is still a potential issue for other optimizers and monitoring gradients.
 
https://github.com/NVIDIA/reinforcer/blob/main/nemo_reinforcer/models/policy/hf_policy.py#L297-L303

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gradient scale in `HfPolicyWorker.train` #84

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Gradient scale in HfPolicyWorker.train #84

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Gradient scale in `HfPolicyWorker.train` #84