Currently checkpoint saves the original fp32 weights in zero_pp_rank_* files, which makes it inaccessible to users who may need the final result to be used elsewhere w/o Deepspeed (e.g. inference or a different system).
So we need deepspeed.save_model() which will reconstruct the master weights and save them as model.bin.
For context for this need please see: #797
Thank you.
Currently checkpoint saves the original fp32 weights in
zero_pp_rank_*files, which makes it inaccessible to users who may need the final result to be used elsewhere w/o Deepspeed (e.g. inference or a different system).So we need
deepspeed.save_model()which will reconstruct the master weights and save them asmodel.bin.For context for this need please see: #797
Thank you.