-
Notifications
You must be signed in to change notification settings - Fork 1.4k
1533 Fix distributed data parallel issue in ClassificationSaver #1535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1533 Fix distributed data parallel issue in ClassificationSaver #1535
Conversation
merge master
|
/black |
Signed-off-by: Nic Ma <nma@nvidia.com>
5597ad9 to
cc53588
Compare
|
/black |
Signed-off-by: monai-bot <monai.miccai2019@gmail.com>
Signed-off-by: Nic Ma <nma@nvidia.com>
Signed-off-by: Nic Ma <nma@nvidia.com>
|
/black |
ericspod
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mentioned some small things only. According to the code coverage output the tests aren't being run under multi GPU with Pytorch 1.7 so string_list_all_gather isn't being tested but it looks ok.
Signed-off-by: Nic Ma <nma@nvidia.com>
|
/black |
5f829f9 to
3b005bd
Compare
|
/black |
Signed-off-by: Nic Ma <nma@nvidia.com>
34e56d9 to
fc08308
Compare
|
/black |
Signed-off-by: monai-bot <monai.miccai2019@gmail.com>
|
@ericspod the code coverage is inaccurate in our current setting as the multiprocess executions are not tracked properly...I'll create an issue |
Fixes #1533 .
Description
This PR fixed the file saving issue of ClassificationSaver in distributed data parallel mode.
Status
Ready
Types of changes
./runtests.sh --codeformat --coverage../runtests.sh --quick.make htmlcommand in thedocs/folder.