Skip to content

Conversation

@Nic-Ma
Copy link
Contributor

@Nic-Ma Nic-Ma commented Sep 30, 2021

Fixes #2975 .

Description

This PR is followup of ticket #3038 , fixed the training slow down issue.
Now the training speed is same as the numpy version benchmark of 0.7 release (56s-58s with 21.08 docker, 52s-54s with 21.06 docker).
The main change is to avoid saving indices into GPU because we actually need to get the item() value in CPU and index the image to crop.

Status

Ready

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
  • Quick tests passed locally by running ./runtests.sh --quick --unittests.
  • In-line docstrings updated.
  • Documentation updated, tested make html command in the docs/ folder.

@Nic-Ma
Copy link
Contributor Author

Nic-Ma commented Sep 30, 2021

/black

Signed-off-by: Nic Ma <nma@nvidia.com>
Signed-off-by: Nic Ma <nma@nvidia.com>
@Nic-Ma
Copy link
Contributor Author

Nic-Ma commented Sep 30, 2021

/black

@Nic-Ma
Copy link
Contributor Author

Nic-Ma commented Sep 30, 2021

BTW, as the numpy version unravel_index is slightly faster than PyTorch version, if running the fast training tutorial with numpy data, the total time can be 1s faster.

Thanks.

Signed-off-by: Nic Ma <nma@nvidia.com>
@Nic-Ma
Copy link
Contributor Author

Nic-Ma commented Sep 30, 2021

/black

@rijobro
Copy link
Contributor

rijobro commented Sep 30, 2021

Now the training speed is same as the numpy version benchmark of 0.7 release

So with the data on the GPU, we're only as fast as the numpy implementation with all on the CPU?

BTW, as the numpy version unravel_index is slightly faster than PyTorch version

We could change the logic to only use torch if the data is already on the GPU. If on the CPU, use numpy regardless of whether input was torch or numpy:

if isinstance(x, torch.Tensor) and x.device is not torch.device("cpu"):
    torch.unravel
else:
    np.unravel

@Nic-Ma
Copy link
Contributor Author

Nic-Ma commented Sep 30, 2021

Hi @rijobro ,

Thanks for your review.
There are 2 data in this case: the image data and the bg / fg indices data.
If we move the image to GPU, it can help accelerate the training.
But if we put the bg / fg indices on GPU, actually it doesn't help, because when we use the indices to randomly crop image, we need to put the indices value back to CPU by torch.item(). That's why here I think we should always save the indices on CPU.
And about the unravel_index question, I would suggest to keep the logic simple and straightforward before we totally completed backends for all the transforms, just use torch APIs for tensor and numpy APIs for numpy array. Now it's just a slight difference, maybe next PyTorch version will improve it.

What do you think?

Thanks.

@Nic-Ma Nic-Ma enabled auto-merge (squash) September 30, 2021 14:18
@rijobro
Copy link
Contributor

rijobro commented Sep 30, 2021

@Nic-Ma sounds good, thanks for the explanations!

@Nic-Ma Nic-Ma merged commit 7c46f8e into Project-MONAI:dev Oct 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Torch FgBgToIndices

2 participants