-
Notifications
You must be signed in to change notification settings - Fork 599
Labels
Description
Bug summary
The current code is not compatible with ROCm 5.
DeePMD-kit Version
devel
TensorFlow Version
2.9.1
How did you download the software?
Built from source
Input Files, Running Commands, Error Log, etc.
- CMake tries to find two libraries
deepmd-kit/source/cmake/FindROCM.cmake
Line 15 in 6e3d4a6
| set(ROCM_FIND_COMPONENTS hip_hcc hiprtc) |
However, I failed to find hip_hcc in either ROCm 5.1 or ROCm 5.3. hiprtc is not in ROCm 5.1, is it related to instllation?
$ ls /global/software/rocm/rocm-5.1.3/lib
CMakeFiles libamdhip64.so.5 libhipsparse.so libmigraphx_c.so.3 libmigraphx_ref.so.2 librocalution.so librocfft-device-1.so librocm-core.so librocrand.so.1 libroctx64.so.1
cmake libamdhip64.so.5.1.50103 libhipsparse.so.0 libmigraphx_c.so.3.0.50103 libmigraphx_ref.so.2.1.50103 librocalution.so.0 librocfft-device-1.so.0 librocm-core.so.1 librocrand.so.1.1.50103 libroctx64.so.1.0.50103
libMIOpen.so libhipblas.so libhipsparse.so.0.1.50103 libmigraphx_device.so libmigraphx_tf.so librocalution.so.0.1.50103 librocfft-device-1.so.0.1.50103 librocm-core.so.1.0.50103 librocsolver.so migraphx.cpython-36m-x86_64-linux-gnu.so
libMIOpen.so.1 libhipblas.so.0 libhsa-amd-aqlprofile64.so libmigraphx_device.so.2 libmigraphx_tf.so.2 librocalution_hip.so librocfft-device-2.so librocm-dbgapi.so librocsolver.so.0 migraphx.cpython-37m-x86_64-linux-gnu.so
libMIOpen.so.1.0.50103 libhipblas.so.0.1.50103 libhsa-runtime64.so libmigraphx_device.so.2.1.50103 libmigraphx_tf.so.2.1.50103 librocalution_hip.so.0 librocfft-device-2.so.0 librocm-dbgapi.so.0 librocsolver.so.0.1.50103 migraphx.cpython-38-x86_64-linux-gnu.so
libOpenCL.so libhipfft.so libhsa-runtime64.so.1 libmigraphx_gpu.so libmiopengemm.so librocalution_hip.so.0.1.50103 librocfft-device-2.so.0.1.50103 librocm-dbgapi.so.0.64.0 librocsparse.so migraphx.so
libOpenCL.so.1 libhiprand.so libhsa-runtime64.so.1.5.50103 libmigraphx_gpu.so.2 libmiopengemm.so.1 librocblas.so librocfft-device-3.so librocm-debug-agent.so.2 librocsparse.so.0 rocmmod
libOpenCL.so.1.2 libhiprand.so.1 libhsakmt.a libmigraphx_gpu.so.2.1.50103 libmiopengemm.so.1.0.50103 librocblas.so.0 librocfft-device-3.so.0 librocm-debug-agent.so.2.0.3 librocsparse.so.0.1.50103
libamd_comgr.so libhiprand.so.1.1.50103 libmigraphx.so libmigraphx_onnx.so library librocblas.so.0.1.50103 librocfft-device-3.so.0.1.50103 librocm_smi64.so libroctracer64.so
libamd_comgr.so.2 libhipsolver.so libmigraphx.so.2 libmigraphx_onnx.so.2 librccl.so librocfft-device-0.so librocfft.so librocm_smi64.so.5 libroctracer64.so.1
libamd_comgr.so.2.4.50103 libhipsolver.so.0 libmigraphx.so.2.1.50103 libmigraphx_onnx.so.2.1.50103 librccl.so.1 librocfft-device-0.so.0 librocfft.so.0 librocprofiler64.so libroctracer64.so.1.0.50103
libamdhip64.so libhipsolver.so.0.1.50103 libmigraphx_c.so libmigraphx_ref.so librccl.so.1.0.50103 librocfft-device-0.so.0.1.50103 librocfft.so.0.1.50103 librocrand.so libroctx64.so
$ ls /global/software/rocm/rocm-5.3.0/lib
CMakeFiles libamd_comgr.so.2.4.50300 libhiprand.so.1 libhipsolver.so.0.1.50300 libhsakmt.a librocalution_hip.so librocfft-device-1.so.0 librocfft.so.0.1.50300 librocm_smi64.so.5 librocsolver.so.0.1.50300 rocblas
cmake libamdhip64.so libhiprand.so.1.1.50300 libhipsparse.so liboam.so librocalution_hip.so.0 librocfft-device-1.so.0.1.50300 librocm-core.so librocm_smi64.so.5.0.50300 librocsparse.so rocmmod
libMIOpen.so libamdhip64.so.5 libhiprtc-builtins.so libhipsparse.so.0 liboam.so.1 librocalution_hip.so.0.1.50300 librocfft-device-2.so librocm-core.so.1 librocprofiler64.so librocsparse.so.0 rocprofiler
libMIOpen.so.1 libamdhip64.so.5.3.50300 libhiprtc-builtins.so.5 libhipsparse.so.0.1.50300 liboam.so.1.0.50300 librocblas.so librocfft-device-2.so.0 librocm-core.so.1.0.50300 librocprofiler64.so.1 librocsparse.so.0.1.50300 roctracer
libMIOpen.so.1.0.50300 libamdocl64.so libhiprtc-builtins.so.5.3.50300 libhsa-amd-aqlprofile64.so librccl.so librocblas.so.0 librocfft-device-2.so.0.1.50300 librocm-dbgapi.so librocprofiler64.so.1.0.50300 libroctracer64.so
libOpenCL.so libhipblas.so libhiprtc.so libhsa-amd-aqlprofile64.so.1 librccl.so.1 librocblas.so.0.1.50300 librocfft-device-3.so librocm-dbgapi.so.0 librocrand.so libroctracer64.so.4
libOpenCL.so.1 libhipblas.so.0 libhiprtc.so.5 libhsa-amd-aqlprofile64.so.1.0.50300 librccl.so.1.0.50300 librocfft-device-0.so librocfft-device-3.so.0 librocm-dbgapi.so.0.67.0 librocrand.so.1 libroctracer64.so.4.1.0
libOpenCL.so.1.2 libhipblas.so.0.1.50300 libhiprtc.so.5.3.50300 libhsa-runtime64.so librocalution.so librocfft-device-0.so.0 librocfft-device-3.so.0.1.50300 librocm-debug-agent.so.2 librocrand.so.1.1.50300 libroctx64.so
libamd_comgr.so libhipfft.so libhipsolver.so libhsa-runtime64.so.1 librocalution.so.0 librocfft-device-0.so.0.1.50300 librocfft.so librocm-debug-agent.so.2.0.3 librocsolver.so libroctx64.so.4
libamd_comgr.so.2 libhiprand.so libhipsolver.so.0 libhsa-runtime64.so.1.7.50300 librocalution.so.0.1.50300 librocfft-device-1.so librocfft.so.0 librocm_smi64.so librocsolver.so.0 libroctx64.so.4.1.0
- Fail to compile hip codes with
-hcflag
[10/75] Building HIPCC object lib/src/rocm/CMakeFiles/deepmd_op_rocm.dir/deepmd_op_rocm_generated_neighbor_list.hip.cu.o
FAILED: lib/src/rocm/CMakeFiles/deepmd_op_rocm.dir/deepmd_op_rocm_generated_neighbor_list.hip.cu.o
cd /tmp/pip-req-build-mge30ha6/_skbuild/linux-x86_64-3.8/cmake-build/lib/src/rocm/CMakeFiles/deepmd_op_rocm.dir && /usr/bin/cmake -E make_directory /tmp/pip-req-build-mge30ha6/_skbuild/linux-x86_64-3.8/cmake-build/lib/src/rocm/CMakeFiles/deepmd_op_rocm.dir//. && /usr/bin/cmake -D verbose:BOOL=OFF -D build_configuration:STRING=RELEASE -D generated_file:STRING=/tmp/pip-req-build-mge30ha6/_skbuild/linux-x86_64-3.8/cmake-build/lib/src/rocm/CMakeFiles/deepmd_op_rocm.dir//./deepmd_op_rocm_generated_neighbor_list.hip.cu.o -P /tmp/pip-req-build-mge30ha6/_skbuild/linux-x86_64-3.8/cmake-build/lib/src/rocm/CMakeFiles/deepmd_op_rocm.dir//deepmd_op_rocm_generated_neighbor_list.hip.cu.o.cmake
clang-14: error: unknown argument: '-hc'
CMake Error at deepmd_op_rocm_generated_neighbor_list.hip.cu.o.cmake:146 (message):
Error generating
/tmp/pip-req-build-mge30ha6/_skbuild/linux-x86_64-3.8/cmake-build/lib/src/rocm/CMakeFiles/deepmd_op_rocm.dir//./deepmd_op_rocm_generated_neighbor_list.hip.cu.o
| set (HIP_HIPCC_FLAGS -hc; -fno-gpu-rdc; --amdgpu-target=gfx906; -fPIC; -O3; --std=c++11) |
btw, I don't understand why there is --amdgpu-target for a specific target.
Steps to Reproduce
git clone https://github.com/deepmodeling/deepmd-kit -B devel
cd deepmd-kit
DP_VARIANT=rocm ROCM_ROOT=/global/software/rocm/rocm-5.1.3 pip install -v . --no-build-isolatioFurther Information, Files, and Links
No response
Reactions are currently unavailable