Describe the feature
BFloat16 may be useful when training LLMs. We should support BF16 mixed precision training.
UPDATED: I found avx does not support load bf16 directly. BF16 for cpu adam kernel may be not implemented in the near future.
Possible roadmap:
Describe the feature
BFloat16 may be useful when training LLMs. We should support BF16 mixed precision training.
UPDATED: I found avx does not support load bf16 directly. BF16 for cpu adam kernel may be not implemented in the near future.
Possible roadmap:
force_fp32_gradto control this.