We need to ensure that the weights of a distributed layer and that of a normal pytorch layer can be interchangeably loaded and saved in shardformer.
We need to ensure that the weights of a distributed layer and that of a normal pytorch layer can be interchangeably loaded and saved in shardformer.