DEBUG {2023.06}[foss/2023a] TensorFlow v2.15.1 w/ CUDA 12.1.1#808
Conversation
|
Instance
|
|
Instance
|
|
Instance
|
|
Just build for a single CPU architecture... bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80 |
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
|
New job on instance
|
|
Rebuilding after arg typo got fixed... bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80 |
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
|
New job on instance
|
…-layer into 2023.06-software.eessi.io-TensorFlow-2.15.1-2023a-CUDA-12.1.1-debug
…-layer into 2023.06-software.eessi.io-TensorFlow-2.15.1-2023a-CUDA-12.1.1-debug
|
Rebuilding after EESSI-extend module got updated... bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80 |
Updates by the bot instance
|
1 similar comment
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
|
New job on instance
|
|
Try building with different Bazel version (6.3.1 instead of 6.1.0)... bot: build repo:eessi.io-2023.06-software arch:x86_64/amd/zen2 accel:nvidia/cc80 |
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
|
New job on instance
|
|
@trz42 Can you retarget this pr? |
|
Closing this. I think, @TopRichard has worked on this (or another TensorFlow version) and found a solution. |
PR to debug issues building TensorFlow v2.15.1 with CUDA v12.1.1
tensorflow.pyeasyblock that solves anImportErrorissue withlibnccl.so.2. See tweak libpaths in TensorFlow easyblock by adding directory containing libnccl.so.2 easybuilders/easybuild-easyblocks#3497Notes:
Bazel,ml_dtypesandtensorboardfirst and install them in the directory for CPU-only software (double-check if and why there are not there yet)Bazel/6.3.1is installed but notBazel/6.1.0which is a dependency for this PRml_dtypesis not installed ... not sure if it should be (see comment/question for tensorboard below) ... OR it's a new dependency for TensorFlow (check easyconfig for CPU-only version)tensorboard/2.13.0is available as an extension of the CPU-only installation ofTensorFlow/2.13.0-foss-2023a... we might want to install the extension under the GPU directory?cuDNNis installed again (in directory for CPU-only software) ... maybe related to switching toEESSI-extend/2023.06-easybuildand the installation path not being configured correctly