Skip to content

[BUG] compatibility with ROCm 5 #2009

@njzjz

Description

@njzjz

Bug summary

The current code is not compatible with ROCm 5.

DeePMD-kit Version

devel

TensorFlow Version

2.9.1

How did you download the software?

Built from source

Input Files, Running Commands, Error Log, etc.

  1. CMake tries to find two libraries

set(ROCM_FIND_COMPONENTS hip_hcc hiprtc)

However, I failed to find hip_hcc in either ROCm 5.1 or ROCm 5.3. hiprtc is not in ROCm 5.1, is it related to instllation?

$ ls /global/software/rocm/rocm-5.1.3/lib
CMakeFiles		   libamdhip64.so.5	      libhipsparse.so		     libmigraphx_c.so.3		      libmigraphx_ref.so.2	    librocalution.so		     librocfft-device-1.so	      librocm-core.so		    librocrand.so.1		 libroctx64.so.1
cmake			   libamdhip64.so.5.1.50103   libhipsparse.so.0		     libmigraphx_c.so.3.0.50103       libmigraphx_ref.so.2.1.50103  librocalution.so.0		     librocfft-device-1.so.0	      librocm-core.so.1		    librocrand.so.1.1.50103	 libroctx64.so.1.0.50103
libMIOpen.so		   libhipblas.so	      libhipsparse.so.0.1.50103      libmigraphx_device.so	      libmigraphx_tf.so		    librocalution.so.0.1.50103	     librocfft-device-1.so.0.1.50103  librocm-core.so.1.0.50103     librocsolver.so		 migraphx.cpython-36m-x86_64-linux-gnu.so
libMIOpen.so.1		   libhipblas.so.0	      libhsa-amd-aqlprofile64.so     libmigraphx_device.so.2	      libmigraphx_tf.so.2	    librocalution_hip.so	     librocfft-device-2.so	      librocm-dbgapi.so		    librocsolver.so.0		 migraphx.cpython-37m-x86_64-linux-gnu.so
libMIOpen.so.1.0.50103	   libhipblas.so.0.1.50103    libhsa-runtime64.so	     libmigraphx_device.so.2.1.50103  libmigraphx_tf.so.2.1.50103   librocalution_hip.so.0	     librocfft-device-2.so.0	      librocm-dbgapi.so.0	    librocsolver.so.0.1.50103	 migraphx.cpython-38-x86_64-linux-gnu.so
libOpenCL.so		   libhipfft.so		      libhsa-runtime64.so.1	     libmigraphx_gpu.so		      libmiopengemm.so		    librocalution_hip.so.0.1.50103   librocfft-device-2.so.0.1.50103  librocm-dbgapi.so.0.64.0	    librocsparse.so		 migraphx.so
libOpenCL.so.1		   libhiprand.so	      libhsa-runtime64.so.1.5.50103  libmigraphx_gpu.so.2	      libmiopengemm.so.1	    librocblas.so		     librocfft-device-3.so	      librocm-debug-agent.so.2	    librocsparse.so.0		 rocmmod
libOpenCL.so.1.2	   libhiprand.so.1	      libhsakmt.a		     libmigraphx_gpu.so.2.1.50103     libmiopengemm.so.1.0.50103    librocblas.so.0		     librocfft-device-3.so.0	      librocm-debug-agent.so.2.0.3  librocsparse.so.0.1.50103
libamd_comgr.so		   libhiprand.so.1.1.50103    libmigraphx.so		     libmigraphx_onnx.so	      library			    librocblas.so.0.1.50103	     librocfft-device-3.so.0.1.50103  librocm_smi64.so		    libroctracer64.so
libamd_comgr.so.2	   libhipsolver.so	      libmigraphx.so.2		     libmigraphx_onnx.so.2	      librccl.so		    librocfft-device-0.so	     librocfft.so		      librocm_smi64.so.5	    libroctracer64.so.1
libamd_comgr.so.2.4.50103  libhipsolver.so.0	      libmigraphx.so.2.1.50103	     libmigraphx_onnx.so.2.1.50103    librccl.so.1		    librocfft-device-0.so.0	     librocfft.so.0		      librocprofiler64.so	    libroctracer64.so.1.0.50103
libamdhip64.so		   libhipsolver.so.0.1.50103  libmigraphx_c.so		     libmigraphx_ref.so		      librccl.so.1.0.50103	    librocfft-device-0.so.0.1.50103  librocfft.so.0.1.50103	      librocrand.so		    libroctx64.so
$ ls /global/software/rocm/rocm-5.3.0/lib
CMakeFiles		libamd_comgr.so.2.4.50300  libhiprand.so.1		    libhipsolver.so.0.1.50300		  libhsakmt.a		      librocalution_hip.so	       librocfft-device-1.so.0		librocfft.so.0.1.50300	      librocm_smi64.so.5	     librocsolver.so.0.1.50300	rocblas
cmake			libamdhip64.so		   libhiprand.so.1.1.50300	    libhipsparse.so			  liboam.so		      librocalution_hip.so.0	       librocfft-device-1.so.0.1.50300	librocm-core.so		      librocm_smi64.so.5.0.50300     librocsparse.so		rocmmod
libMIOpen.so		libamdhip64.so.5	   libhiprtc-builtins.so	    libhipsparse.so.0			  liboam.so.1		      librocalution_hip.so.0.1.50300   librocfft-device-2.so		librocm-core.so.1	      librocprofiler64.so	     librocsparse.so.0		rocprofiler
libMIOpen.so.1		libamdhip64.so.5.3.50300   libhiprtc-builtins.so.5	    libhipsparse.so.0.1.50300		  liboam.so.1.0.50300	      librocblas.so		       librocfft-device-2.so.0		librocm-core.so.1.0.50300     librocprofiler64.so.1	     librocsparse.so.0.1.50300	roctracer
libMIOpen.so.1.0.50300	libamdocl64.so		   libhiprtc-builtins.so.5.3.50300  libhsa-amd-aqlprofile64.so		  librccl.so		      librocblas.so.0		       librocfft-device-2.so.0.1.50300	librocm-dbgapi.so	      librocprofiler64.so.1.0.50300  libroctracer64.so
libOpenCL.so		libhipblas.so		   libhiprtc.so			    libhsa-amd-aqlprofile64.so.1	  librccl.so.1		      librocblas.so.0.1.50300	       librocfft-device-3.so		librocm-dbgapi.so.0	      librocrand.so		     libroctracer64.so.4
libOpenCL.so.1		libhipblas.so.0		   libhiprtc.so.5		    libhsa-amd-aqlprofile64.so.1.0.50300  librccl.so.1.0.50300	      librocfft-device-0.so	       librocfft-device-3.so.0		librocm-dbgapi.so.0.67.0      librocrand.so.1		     libroctracer64.so.4.1.0
libOpenCL.so.1.2	libhipblas.so.0.1.50300    libhiprtc.so.5.3.50300	    libhsa-runtime64.so			  librocalution.so	      librocfft-device-0.so.0	       librocfft-device-3.so.0.1.50300	librocm-debug-agent.so.2      librocrand.so.1.1.50300	     libroctx64.so
libamd_comgr.so		libhipfft.so		   libhipsolver.so		    libhsa-runtime64.so.1		  librocalution.so.0	      librocfft-device-0.so.0.1.50300  librocfft.so			librocm-debug-agent.so.2.0.3  librocsolver.so		     libroctx64.so.4
libamd_comgr.so.2	libhiprand.so		   libhipsolver.so.0		    libhsa-runtime64.so.1.7.50300	  librocalution.so.0.1.50300  librocfft-device-1.so	       librocfft.so.0			librocm_smi64.so	      librocsolver.so.0		     libroctx64.so.4.1.0
  1. Fail to compile hip codes with -hc flag
  [10/75] Building HIPCC object lib/src/rocm/CMakeFiles/deepmd_op_rocm.dir/deepmd_op_rocm_generated_neighbor_list.hip.cu.o
  FAILED: lib/src/rocm/CMakeFiles/deepmd_op_rocm.dir/deepmd_op_rocm_generated_neighbor_list.hip.cu.o
  cd /tmp/pip-req-build-mge30ha6/_skbuild/linux-x86_64-3.8/cmake-build/lib/src/rocm/CMakeFiles/deepmd_op_rocm.dir && /usr/bin/cmake -E make_directory /tmp/pip-req-build-mge30ha6/_skbuild/linux-x86_64-3.8/cmake-build/lib/src/rocm/CMakeFiles/deepmd_op_rocm.dir//. && /usr/bin/cmake -D verbose:BOOL=OFF -D build_configuration:STRING=RELEASE -D generated_file:STRING=/tmp/pip-req-build-mge30ha6/_skbuild/linux-x86_64-3.8/cmake-build/lib/src/rocm/CMakeFiles/deepmd_op_rocm.dir//./deepmd_op_rocm_generated_neighbor_list.hip.cu.o -P /tmp/pip-req-build-mge30ha6/_skbuild/linux-x86_64-3.8/cmake-build/lib/src/rocm/CMakeFiles/deepmd_op_rocm.dir//deepmd_op_rocm_generated_neighbor_list.hip.cu.o.cmake
  clang-14: error: unknown argument: '-hc'
  CMake Error at deepmd_op_rocm_generated_neighbor_list.hip.cu.o.cmake:146 (message):
    Error generating
    /tmp/pip-req-build-mge30ha6/_skbuild/linux-x86_64-3.8/cmake-build/lib/src/rocm/CMakeFiles/deepmd_op_rocm.dir//./deepmd_op_rocm_generated_neighbor_list.hip.cu.o

set (HIP_HIPCC_FLAGS -hc; -fno-gpu-rdc; --amdgpu-target=gfx906; -fPIC; -O3; --std=c++11)

btw, I don't understand why there is --amdgpu-target for a specific target.

Steps to Reproduce

git clone https://github.com/deepmodeling/deepmd-kit -B devel
cd deepmd-kit
DP_VARIANT=rocm ROCM_ROOT=/global/software/rocm/rocm-5.1.3 pip install -v . --no-build-isolatio

Further Information, Files, and Links

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions