Rebuild ESPResSo for CUDA sanity check#1168
Conversation
|
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-surf for:arch=x86_64/intel/icelake,accel=nvidia/cc80 |
|
New job on instance
|
|
Hm... seems to be compiled for SM_52, and with PTX code for SM_52. So, something needs to be fixed here: how do we convince the ESPResSo build system to compile for a different CUDA arch? |
|
Seems like that's hardcoded here for version 4.2.2: Though new versions apparently will allow you to explicitly define this: |
|
The following patch should work: diff --git a/cmake/FindCUDACompilerNVCC.cmake b/cmake/FindCUDACompilerNVCC.cmake
index 08f9db312..f68d4db94 100644
--- a/cmake/FindCUDACompilerNVCC.cmake
+++ b/cmake/FindCUDACompilerNVCC.cmake
@@ -52,8 +52,16 @@ list(APPEND CUDA_NVCC_FLAGS_COVERAGE -O3 -g -Xptxas=-O3 -Xcompiler=-Og,-g)
list(APPEND CUDA_NVCC_FLAGS_RELWITHASSERT -O3 -g -Xptxas=-O3 -Xcompiler=-O3,-g)
-if(CMAKE_CUDA_COMPILER_VERSION VERSION_LESS 11)
- list(APPEND CUDA_NVCC_FLAGS -gencode=arch=compute_30,code=sm_30)
+if(NOT DEFINED ESPRESSO_CUDA_ARCHITECTURES)
+ if("$ENV{CUDAARCHS}" STREQUAL "")
+ set(ESPRESSO_CUDA_ARCHITECTURES "75;86;89" CACHE INTERNAL "")
+ else()
+ set(ESPRESSO_CUDA_ARCHITECTURES "$ENV{CUDAARCHS}" CACHE INTERNAL "")
+ endif()
endif()
+foreach(ESPRESSO_CUDA_ARCH ${ESPRESSO_CUDA_ARCHITECTURES})
+ list(APPEND CUDA_NVCC_FLAGS
+ "-gencode=arch=compute_${ESPRESSO_CUDA_ARCH},code=sm_${ESPRESSO_CUDA_ARCH}"
+ "-gencode=arch=compute_${ESPRESSO_CUDA_ARCH},code=compute_${ESPRESSO_CUDA_ARCH}")
+endforeach()
list(APPEND CUDA_NVCC_FLAGS
- -gencode=arch=compute_52,code=sm_52
- -gencode=arch=compute_52,code=compute_52 -std=c++${CMAKE_CUDA_STANDARD}
+ -std=c++${CMAKE_CUDA_STANDARD}
$<$<BOOL:${WARNINGS_ARE_ERRORS}>:-Xcompiler=-Werror;-Xptxas=-Werror>With this one can invoke the |
…50828-eb-5.1.1-rebuild-ESPResSo-for-cuda-sanity-check.yml
|
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-surf for:arch=x86_64/intel/icelake,accel=nvidia/cc80 |
|
New job on instance
|
|
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-surf for:arch=x86_64/intel/icelake,accel=nvidia/cc80 |
|
New job on instance
|
edit: oh, of course, we need to set |
|
Or pass |
|
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-surf for:arch=x86_64/intel/icelake,accel=nvidia/cc80 |
|
New job on instance
|
|
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
bot: build repo:eessi.io-2023.06-software instance:eessi-bot-mc-aws on:arch=zen2 for:arch=x86_64/amd/zen2,accel=nvidia/cc70 |
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
New job on instance
|
|
Only need to do one more build with the Surf bot, but the cluster is currently down. |
|
Snellius is back online. Not sure if the bot is already available, but let's give it a try: bot: build repo:eessi.io-2023.06-software instance:eessi-bot-surf for:arch=x86_64/amd/zen4,accel=nvidia/cc90 |
|
New job on instance
|
|
The Surf job was OOM killed. Don't really understand this, the other job that ran on Snellius (for icelake+cc80, see #1168 (comment)) only used: |
|
Let's try it one more time: bot: build repo:eessi.io-2023.06-software instance:eessi-bot-surf for:arch=x86_64/amd/zen4,accel=nvidia/cc90 |
|
New job on instance
|
|
Staging PR merged, all tarballs have been ingested 🎉 |
Note: I've seen in an interactive build that this didn't pass the CUDA sanity check. So we may have to investigate why not (i.e. which files don't provide the correct device code, and why), and how to make it pass...