Skip to content

Cuda sync issue after update base image #8054

@KumoLiu

Description

@KumoLiu
Starting test: test_value_0_fp32 (tests.test_convert_to_trt.TestConvertToTRT)...
WARNING:root:Given dtype that does not have direct mapping to torch (dtype.unknown), defaulting to torch.float
WARNING:root:Given dtype that does not have direct mapping to torch (dtype.unknown), defaulting to torch.float
WARNING: [Torch-TensorRT] - Detected and removing exception in TorchScript IR for node:  = prim::If(%387) # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/batchnorm.py:562:8  block0():    %388 : str = aten::format(%318, %386) # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/batchnorm.py:563:29     = prim::RaiseException(%388, %317) # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/batchnorm.py:563:12    -> ()  block1():    -> ()
WARNING: [Torch-TensorRT] - Detected and removing exception in TorchScript IR for node:  = prim::If(%401) # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/batchnorm.py:562:8  block0():    %402 : str = aten::format(%318, %400) # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/batchnorm.py:563:29     = prim::RaiseException(%402, %317) # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/batchnorm.py:563:12    -> ()  block1():    -> ()
WARNING: [Torch-TensorRT] - Detected and removing exception in TorchScript IR for node:  = prim::If(%415) # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/batchnorm.py:562:8  block0():    %416 : str = aten::format(%318, %414) # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/batchnorm.py:563:29     = prim::RaiseException(%416, %317) # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/batchnorm.py:563:12    -> ()  block1():    -> ()
WARNING: [Torch-TensorRT] - Detected and removing exception in TorchScript IR for node:  = prim::If(%429) # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/batchnorm.py:562:8  block0():    %430 : str = aten::format(%318, %428) # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/batchnorm.py:563:29     = prim::RaiseException(%430, %317) # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/batchnorm.py:563:12    -> ()  block1():    -> ()
WARNING: [Torch-TensorRT] - Conv3d layer with kernel size = 1 configuration incurs a failure with TensorRT tactic optimizer in some cases.     Github issue: https://github.com/pytorch/TensorRT/issues/1445. Other conv variants do not have this issue.
WARNING: [Torch-TensorRT TorchScript Conversion Context] - Environment variable NVIDIA_TF32_OVERRIDE=0 but BuilderFlag::kTF32 is set. Disabling TF32.
WARNING: [Torch-TensorRT TorchScript Conversion Context] - Environment variable NVIDIA_TF32_OVERRIDE=0 but BuilderFlag::kTF32 is set. Disabling TF32.
.Finished test: test_value_0_fp32 (tests.test_convert_to_trt.TestConvertToTRT) (32.1s)
Starting test: test_value_1_fp16 (tests.test_convert_to_trt.TestConvertToTRT)...
WARNING:root:Given dtype that does not have direct mapping to torch (dtype.unknown), defaulting to torch.float
WARNING:root:Given dtype that does not have direct mapping to torch (dtype.unknown), defaulting to torch.float
WARNING: [Torch-TensorRT] - Detected and removing exception in TorchScript IR for node:  = prim::If(%387) # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/batchnorm.py:562:8  block0():    %388 : str = aten::format(%318, %386) # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/batchnorm.py:563:29     = prim::RaiseException(%388, %317) # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/batchnorm.py:563:12    -> ()  block1():    -> ()
WARNING: [Torch-TensorRT] - Detected and removing exception in TorchScript IR for node:  = prim::If(%401) # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/batchnorm.py:562:8  block0():    %402 : str = aten::format(%318, %400) # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/batchnorm.py:563:29     = prim::RaiseException(%402, %317) # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/batchnorm.py:563:12    -> ()  block1():    -> ()
WARNING: [Torch-TensorRT] - Detected and removing exception in TorchScript IR for node:  = prim::If(%415) # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/batchnorm.py:562:8  block0():    %416 : str = aten::format(%318, %414) # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/batchnorm.py:563:29     = prim::RaiseException(%416, %317) # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/batchnorm.py:563:12    -> ()  block1():    -> ()
WARNING: [Torch-TensorRT] - Detected and removing exception in TorchScript IR for node:  = prim::If(%429) # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/batchnorm.py:562:8  block0():    %430 : str = aten::format(%318, %428) # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/batchnorm.py:563:29     = prim::RaiseException(%430, %317) # /usr/local/lib/python3.10/dist-packages/torch/nn/modules/batchnorm.py:563:12    -> ()  block1():    -> ()
WARNING: [Torch-TensorRT] - Conv3d layer with kernel size = 1 configuration incurs a failure with TensorRT tactic optimizer in some cases.     Github issue: https://github.com/pytorch/TensorRT/issues/1445. Other conv variants do not have this issue.
WARNING: [Torch-TensorRT TorchScript Conversion Context] - Environment variable NVIDIA_TF32_OVERRIDE=0 but BuilderFlag::kTF32 is set. Disabling TF32.
WARNING: [Torch-TensorRT TorchScript Conversion Context] - Environment variable NVIDIA_TF32_OVERRIDE=0 but BuilderFlag::kTF32 is set. Disabling TF32.

======================================================================
FAIL: test_value_043_cuda (tests.test_hausdorff_distance.TestHausdorffDistance)
device: cuda metric: euclidean directed:False expected: 20.223748416156685
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/parameterized/parameterized.py", line 620, in standalone_func
    return func(*(a + p.args), **p.kwargs, **kw)
  File "/workspace/MONAI/tests/test_hausdorff_distance.py", line 194, in test_value
    np.testing.assert_allclose(expected_value, result.cpu(), rtol=1e-6)
  File "/usr/local/lib/python3.10/dist-packages/numpy/testing/_private/utils.py", line 1592, in assert_allclose
    assert_array_compare(compare, actual, desired, err_msg=str(err_msg),
  File "/usr/lib/python3.10/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "/usr/local/lib/python3.10/dist-packages/numpy/testing/_private/utils.py", line 862, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=1e-06, atol=0

Mismatched elements: 1 / 1 (100%)
Max absolute difference: 15.198812
Max relative difference: 3.0246766
 x: array(20.223748)
 y: array([5.024938], dtype=float32)

======================================================================
FAIL: test_value_078_cuda (tests.test_hausdorff_distance.TestHausdorffDistance)
device: cuda metric: euclidean directed:True expected: 19.924858845171276
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/parameterized/parameterized.py", line 620, in standalone_func
    return func(*(a + p.args), **p.kwargs, **kw)
  File "/workspace/MONAI/tests/test_hausdorff_distance.py", line 194, in test_value
    np.testing.assert_allclose(expected_value, result.cpu(), rtol=1e-6)
  File "/usr/local/lib/python3.10/dist-packages/numpy/testing/_private/utils.py", line 1592, in assert_allclose
    assert_array_compare(compare, actual, desired, err_msg=str(err_msg),
  File "/usr/lib/python3.10/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "/usr/local/lib/python3.10/dist-packages/numpy/testing/_private/utils.py", line 862, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=1e-06, atol=0

Mismatched elements: 1 / 1 (100%)
Max absolute difference: 19.924858
Max relative difference: inf
 x: array(19.924859)
 y: array([0.], dtype=float32)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions