Since the fp32 params have been sharded evenly, the `state_dict` has been actually destroyed. Therefore, we need to override the `state_dict` method to support load or save checkpoint.
Since the fp32 params have been sharded evenly, the
state_dicthas been actually destroyed.Therefore, we need to override the
state_dictmethod to support load or save checkpoint.