Skip to content

Gradient scale in HfPolicyWorker.train #84

@KiddoZhu

Description

@KiddoZhu

Describe the bug

When gradients are accumulated across micro batches, they should be averaged rather than summed up to eliminate the effect of global batch size. The current implementation results in a gradient scale proportional to gbs / mbs. This is not a big issue for Adam, but is still a potential issue for other optimizers and monitoring gradients.
 
https://github.com/NVIDIA/reinforcer/blob/main/nemo_reinforcer/models/policy/hf_policy.py#L297-L303

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions