Enable mixed precision training

**Describe the bug**

Mixed precision training as enabled in https://github.com/NVIDIA/reinforcer/blob/main/nemo_reinforcer/models/policy/hf_policy.py#L104 is causing convergence issues on grpo and sft. Also, there is a bug in pytorch + fsdp where the optimizer states are not respected and are kept in model precision (bf16). https://github.com/pytorch/pytorch/issues/143900 

This PR should study convergence for both grpo and sft while enabling mixed precision.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable mixed precision training #13

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Enable mixed precision training #13

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions