-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
Describe the bug
Spacing transform relies on the torch.inverse() to compute the affine matrix. However, the operation has errors compared to numpy. As a result, the integration test test_integration_segmentation3d is affected by the changes between PyTorch versions.
As an example, torch.inverse() provides different results between PyTorch 22.09 and 22.11. Steps to reproduce are listed below. The error seems to be at the range of 1e-7 on ampere GPUs.
Furthermore, we use numpy function np.linalg.inv() as the gold standard.
For the particular input, we see the error in 22.11/22.12/23.01 is 3 times higher than 22.09.
We tried to turn off TF32, set the matmul precision to highest (FP32) in PyTorch containers (22.11-23.01), and the result is the same.
Result in 22.09:
Input tensor:
tensor([[[1.8692e-02, 0.0000e+00, 0.0000e+00, -9.9065e-01],
[0.0000e+00, 1.2500e-02, 0.0000e+00, -9.9375e-01],
[0.0000e+00, 0.0000e+00, 1.0989e-02, -9.9451e-01],
[0.0000e+00, 0.0000e+00, 0.0000e+00, 1.0000e+00]]])
Output of torch.inverse in PyTorch 22.09:
tensor([[[5.3500e+01, 0.0000e+00, -0.0000e+00, 5.3000e+01],
[0.0000e+00, 8.0000e+01, -0.0000e+00, 7.9500e+01],
[0.0000e+00, 0.0000e+00, 9.1000e+01, 9.0500e+01],
[0.0000e+00, 0.0000e+00, 0.0000e+00, 1.0000e+00]]])
Error compared with numpy
tensor(1.1921e-07)
Result in 22.11
Input tensor:
tensor([[[1.8692e-02, 0.0000e+00, 0.0000e+00, -9.9065e-01],
[0.0000e+00, 1.2500e-02, 0.0000e+00, -9.9375e-01],
[0.0000e+00, 0.0000e+00, 1.0989e-02, -9.9451e-01],
[0.0000e+00, 0.0000e+00, 0.0000e+00, 1.0000e+00]]])
Output of torch.inverse in PyTorch 22.11:
tensor([[[ 5.3500e+01, -2.2252e-06, -1.3113e-06, 5.3000e+01],
[ 6.3144e-06, 8.0000e+01, -1.9670e-06, 7.9500e+01],
[-6.6143e-06, -9.4399e-06, 9.1000e+01, 9.0500e+01],
[-7.4506e-09, -4.1986e-08, -2.4742e-08, 1.0000e+00]]])
Error compared with numpy
tensor(3.4737e-07)
To Reproduce
- Run the snippet in shell script to get result in PyTorch 22.09 container.
docker run -it --rm --gpus=all -e NVIDIA_TF32_OVERRIDE=0 nvcr.io/nvidia/pytorch:22.09-py3 bash -c "python << EOF
import torch
import numpy as np
torch.set_printoptions(sci_mode=True)
torch.set_float32_matmul_precision(\"highest\")
C = torch.tensor([[[ 0.01869158819317817687988281250000,
0.00000000000000000000000000000000,
0.00000000000000000000000000000000,
-0.99065423011779785156250000000000],
[ 0.00000000000000000000000000000000,
0.01250000018626451492309570312500,
0.00000000000000000000000000000000,
-0.99374997615814208984375000000000],
[ 0.00000000000000000000000000000000,
0.00000000000000000000000000000000,
0.01098901126533746719360351562500,
-0.99450546503067016601562500000000],
[ 0.00000000000000000000000000000000,
0.00000000000000000000000000000000,
0.00000000000000000000000000000000,
1.00000000000000000000000000000000]]])
print(\"Input tensor: \")
print(C)
print(\"Output of torch.inverse in PyTorch 22.09: \")
print(torch.inverse(C))
print(\"Error compared with numpy\")
print((torch.tensor(np.array(C) @ np.linalg.inv(np.array(C))) - C @ torch.inverse(C)).abs().sum())
EOF
"
- Run the same snippet in a different container later than 22.11. Here we use 22.11 for example:
docker run -it --rm --gpus=all -e NVIDIA_TF32_OVERRIDE=0 nvcr.io/nvidia/pytorch:22.11-py3 bash -c "python << EOF
import torch
import numpy as np
torch.set_printoptions(sci_mode=True)
torch.set_float32_matmul_precision(\"highest\")
C = torch.tensor([[[ 0.01869158819317817687988281250000,
0.00000000000000000000000000000000,
0.00000000000000000000000000000000,
-0.99065423011779785156250000000000],
[ 0.00000000000000000000000000000000,
0.01250000018626451492309570312500,
0.00000000000000000000000000000000,
-0.99374997615814208984375000000000],
[ 0.00000000000000000000000000000000,
0.00000000000000000000000000000000,
0.01098901126533746719360351562500,
-0.99450546503067016601562500000000],
[ 0.00000000000000000000000000000000,
0.00000000000000000000000000000000,
0.00000000000000000000000000000000,
1.00000000000000000000000000000000]]])
print(\"Input tensor: \")
print(C)
print(\"Output of torch.inverse in PyTorch 22.11: \")
print(torch.inverse(C))
print(\"Error compared with numpy\")
print((torch.tensor(np.array(C) @ np.linalg.inv(np.array(C))) - C @ torch.inverse(C)).abs().sum())
EOF
"
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots
If applicable, add screenshots to help explain your problem.
Environment
Ensuring you use the relevant python executable, please paste the output of:
python -c 'import monai; monai.config.print_debug_info()'
Additional context
Add any other context about the problem here.