forked from abacusmodeling/abacus-develop
-
Notifications
You must be signed in to change notification settings - Fork 153
Closed
Labels
Compile & CICD & Docs & DependenciesIssues related to compiling ABACUSIssues related to compiling ABACUSGPU & DCU & HPCGPU and DCU and HPC related any issuesGPU and DCU and HPC related any issuesQuestionsRaise your quesiton! We will answer it.Raise your quesiton! We will answer it.
Description
Details
Initially, I successfully installed the GPU version of ABACUS on the HanHai 22 supercomputing platform using icpc. The commands and environment used are as follows:
environment:
module purge
module load cmake/3.19.0
module load tbb/2021.6.0
module load compiler-rt/2022.1.0
module load oclfpga/2022.1.0
module load compiler/2022.1.0
module load mpi/2021.6.0
module load mkl/2022.1.0
module load cuda/12.8
module load elpa/2023.11.001/intelmpi/2021.6/intel-2022.1.0
instruction:
CC=mpiicc CXX=mpiicpc FC=mpiifort \
cmake -B build -DCereal_INCLUDE_DIR=/home/liuxiaohuigroup/handy/cereal-1.3.2/include \
-DELPA_LINK_LIBRARIES=/opt/elpa/2023.11.001/intelmpi/2021.6/intel-2022.1.0/lib/libelpa.so \
-DELPA_INCLUDE_DIR=/opt/elpa/2023.11.001/intelmpi/2021.6/intel-2022.1.0/include/elpa-2023.11.001 \
-DUSE_OPENMP=ON -DENABLE_LCAO=ON -DUSE_CUDA=ON -DUSE_ELPA=ON -DDEBUG_INFO=1
cd build && make -j32
Then, based on the website's instructions, I wanted to set the -DENABLE_CUSOLVERMP=ON option.
This is the environment and settings I changed to successfully set up this option.
environment:
module purge
module load cmake/3.19.0
module load tbb/2021.6.0
module load compiler-rt/2022.1.0
module load oclfpga/2022.1.0
module load compiler/2022.1.0
module load mpi/2021.6.0
module load mkl/2022.1.0
module load nvhpc-byo-compiler/22.7
module load cuda/12.8
module load elpa/2023.11.001/intelmpi/2021.6/intel-2022.1.0
instruction:
CC=mpiicc CXX=mpiicpc FC=mpiifort \
cmake -B build \
-DCereal_INCLUDE_DIR=/home/liuxiaohuigroup/handy/cereal-1.3.2/include \
-DCMAKE_CUDA_COMPILER=/opt/cuda/12.8/bin/nvcc \
-DELPA_LINK_LIBRARIES=/opt/elpa/2023.11.001/intelmpi/2021.6/intel-2022.1.0/lib/libelpa.so \
-DELPA_INCLUDE_DIR=/opt/elpa/2023.11.001/intelmpi/2021.6/intel-2022.1.0/include/elpa-2023.11.001 \
-DCAL_CUSOLVERMP_PATH=/opt/hpc_sdk/2022_227/Linux_x86_64/22.7/math_libs/lib64 \
-DUSE_OPENMP=ON \
-DENABLE_LCAO=ON \
-DUSE_CUDA=ON \
-DUSE_ELPA=ON \
-DDEBUG_INFO=1 \
-DENABLE_CUSOLVERMP=ON
cd build/
make VERBOSE=1 -j$(nproc) > build_log.txt 2>&1
These are some module details.
/opt/MODULES/compiler/nvhpc-byo-compiler/22.7:
conflict nvhpc
conflict nvhpc-nompi
conflict nvhpc-byo-compiler
setenv NVHPC /opt/hpc_sdk/2022_227
setenv NVHPC_ROOT /opt/hpc_sdk/2022_227/Linux_x86_64/22.7
prepend-path PATH /opt/hpc_sdk/2022_227/Linux_x86_64/22.7/cuda/bin
prepend-path CPATH /opt/hpc_sdk/2022_227/Linux_x86_64/22.7/cuda/include
prepend-path CPATH /opt/hpc_sdk/2022_227/Linux_x86_64/22.7/math_libs/include
prepend-path CPATH /opt/hpc_sdk/2022_227/Linux_x86_64/22.7/comm_libs/nccl/include
prepend-path CPATH /opt/hpc_sdk/2022_227/Linux_x86_64/22.7/comm_libs/nvshmem/include
prepend-path LD_LIBRARY_PATH /opt/hpc_sdk/2022_227/Linux_x86_64/22.7/cuda/lib64
prepend-path LD_LIBRARY_PATH /opt/hpc_sdk/2022_227/Linux_x86_64/22.7/cuda/extras/CUPTI/lib64
prepend-path LD_LIBRARY_PATH /opt/hpc_sdk/2022_227/Linux_x86_64/22.7/math_libs/lib64
prepend-path LD_LIBRARY_PATH /opt/hpc_sdk/2022_227/Linux_x86_64/22.7/comm_libs/nccl/lib
prepend-path LD_LIBRARY_PATH /opt/hpc_sdk/2022_227/Linux_x86_64/22.7/comm_libs/nvshmem/lib
****
/opt/hpc_sdk/2022_227/Linux_x86_64/22.7/math_libs/lib64$ ls
libcal.so libcublas_static.a libcufft_static_nocallback.a libcurand_static.a libcusolver.so.11.3.5.50 libcutensorMg.so.1.5.0 libnvblas.so
libcublasLt.so libcufftMp.so libcufftw.so libcusolver_lapack_static.a libcusolver_static.a libcutensorMg_static.a libnvblas.so.11
libcublasLt.so.11 libcufftMp.so.10 libcufftw.so.10 libcusolverMg.so libcusparse.so libcutensor.so libnvblas.so.11.10.1.25
libcublasLt.so.11.10.1.25 libcufftMp.so.10.8.1 libcufftw.so.10.7.2.50 libcusolverMg.so.11 libcusparse.so.11 libcutensor.so.1 stubs
libcublasLt_static.a libcufft.so libcufftw_static.a libcusolverMg.so.11.3.5.50 libcusparse.so.11.7.3.50 libcutensor.so.1.5.0
libcublas.so libcufft.so.10 libcurand.so libcusolverMp.so libcusparse_static.a libcutensor_static.a
libcublas.so.11 libcufft.so.10.7.2.50 libcurand.so.10 libcusolver.so libcutensorMg.so liblapack_static.a
libcublas.so.11.10.1.25 libcufft_static.a libcurand.so.10.2.10.50 libcusolver.so.11 libcutensorMg.so.1 libmetis_static.a
My cmake phase works fine, but when I run make, I get the following error:
[ 90%] Built target cell
[ 90%] Linking CXX static library libcontainer.a
[ 90%] Built target container
[ 90%] Built target elecstate
[ 90%] Built target vdw
[ 90%] Built target io_basic
[ 90%] Built target device
[ 90%] Built target gint
make: *** [Makefile:149: all] Error 2
The build_log.txt file displays the following detailed error message:
In file included from /home/liuxiaohuigroup/handy/abacus-develop-LTSv3.10.0/source/module_hsolver/diago_cusolvermp.h(8),
from /home/liuxiaohuigroup/handy/abacus-develop-LTSv3.10.0/source/module_hsolver/hsolver_lcao.cpp(11):
/home/liuxiaohuigroup/handy/abacus-develop-LTSv3.10.0/source/module_hsolver/kernels/cuda/diag_cusolvermp.cuh(59): error: identifier "cusolverMpGrid_t" is undefined
cusolverMpGrid_t grid = NULL;
^
In file included from /home/liuxiaohuigroup/handy/abacus-develop-LTSv3.10.0/source/module_hsolver/diago_cusolvermp.h(8),
from /home/liuxiaohuigroup/handy/abacus-develop-LTSv3.10.0/source/module_hsolver/hsolver_lcao.cpp(11):
/home/liuxiaohuigroup/handy/abacus-develop-LTSv3.10.0/source/module_hsolver/kernels/cuda/diag_cusolvermp.cuh(62): error: identifier "cusolverMpMatrixDescriptor_t" is undefined
cusolverMpMatrixDescriptor_t desc_for_cusolvermp = NULL;
^
......
make[2]: *** [source/module_hsolver/CMakeFiles/diag_cusolver.dir/build.make:212: source/module_hsolver/CMakeFiles/diag_cusolver.dir/hsolver_lcao.cpp.o] Error 2
make[2]: *** Waiting for unfinished jobs....
...
[ 90%] Built target gint
make[1]: Leaving directory '/home/liuxiaohuigroup/handy/abacus-develop-LTSv3.10.0/build'
make: *** [Makefile:149: all] Error 2
Is there a good solution to this problem?
Have you read FAQ on the online manual http://abacus.deepmodeling.com/en/latest/community/faq.html
- Yes, I have read the FAQ part on online manual.
Task list for Issue attackers (only for developers)
- Understand the problem or question described by the user.
- Check if the issue is a known problem or has been addressed in the documentation.
- Test the issue or problem on a similar system or environment, if possible.
- Identify the root cause or provide clarification on the user's question.
- Provide a step-by-step guide, including any necessary resources, to resolve the issue or answer the question.
- If the issue is related to documentation, update the documentation to prevent future confusion (optional).
- If the issue is related to code, consider implementing a fix or improvement (optional).
- Review and incorporate any relevant feedback from users or developers.
- Ensure the user's issue is resolved or their question is answered and close the ticket.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
Compile & CICD & Docs & DependenciesIssues related to compiling ABACUSIssues related to compiling ABACUSGPU & DCU & HPCGPU and DCU and HPC related any issuesGPU and DCU and HPC related any issuesQuestionsRaise your quesiton! We will answer it.Raise your quesiton! We will answer it.