Skip to content

[save_checkpoint] document the requirement to call for all ranks#801

Merged
tjruwase merged 1 commit intodeepspeedai:masterfrom
stas00:checkpoint-doc
Feb 26, 2021
Merged

[save_checkpoint] document the requirement to call for all ranks#801
tjruwase merged 1 commit intodeepspeedai:masterfrom
stas00:checkpoint-doc

Conversation

@stas00
Copy link
Copy Markdown
Collaborator

@stas00 stas00 commented Feb 26, 2021

As I was answered in #797 all processes must call this method - and I unknowingly called it for rank 0 and was getting a hanging. So this PR documents this requirement in the tutorial and the API doc.

Hmm, probably the load_checkpoint should have a similar note too.

Thank you.

@tjruwase tjruwase merged commit 7eb083c into deepspeedai:master Feb 26, 2021
@stas00 stas00 deleted the checkpoint-doc branch February 26, 2021 21:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants