Could you please merge this wrong dtype fix from downstream: https://github.com/bigscience-workshop/Megatron-DeepSpeed/commit/42fe3b3986c5414bfeb7affafcb0f4b6615ad86c Thank you!
Could you please merge this wrong dtype fix from downstream: bigscience-workshop/Megatron-DeepSpeed@42fe3b3
Thank you!