-
Notifications
You must be signed in to change notification settings - Fork 1.4k
2975 Fix the perf issue of RandCropByPosNegLabel #3050
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
merge master
merge master
merge master
merge master
merge master
merge master
Signed-off-by: Nic Ma <nma@nvidia.com>
|
/black |
Signed-off-by: Nic Ma <nma@nvidia.com>
Signed-off-by: Nic Ma <nma@nvidia.com>
|
/black |
|
BTW, as the numpy version Thanks. |
Signed-off-by: Nic Ma <nma@nvidia.com>
|
/black |
So with the data on the GPU, we're only as fast as the numpy implementation with all on the CPU?
We could change the logic to only use torch if the data is already on the GPU. If on the CPU, use numpy regardless of whether input was torch or numpy: if isinstance(x, torch.Tensor) and x.device is not torch.device("cpu"):
torch.unravel
else:
np.unravel |
|
Hi @rijobro , Thanks for your review. What do you think? Thanks. |
|
@Nic-Ma sounds good, thanks for the explanations! |
Signed-off-by: Nic Ma <nma@nvidia.com>
Fixes #2975 .
Description
This PR is followup of ticket #3038 , fixed the training slow down issue.
Now the training speed is same as the numpy version benchmark of 0.7 release (56s-58s with 21.08 docker, 52s-54s with 21.06 docker).
The main change is to avoid saving indices into GPU because we actually need to get the
item()value in CPU and index the image to crop.Status
Ready
Types of changes
./runtests.sh -f -u --net --coverage../runtests.sh --quick --unittests.make htmlcommand in thedocs/folder.