Skip to content

Conversation

@glesur
Copy link
Contributor

@glesur glesur commented Jan 15, 2025

For some obscure reason, MPI_AllReduce with openmpi is broken when using cuda arrays on OpenMPI 4.1.5. This led idefix to think that it is not running a cuda-aware library on JeanZay with H100 GPUs (for which only openmpi 4.1.5 is available).

This issue might be related to open-mpi/ompi#9845

Since Idefix does not use reduction operations on GPUs, but only Send/Recv, this PR patch this problem by testing the MPI library with Send/Recv instead of AllReduce to detect non-Cuda Aware libraries.

@glesur glesur added the bug Something isn't working label Jan 15, 2025
@glesur glesur merged commit 7b2910c into develop Jan 15, 2025
38 checks passed
@glesur glesur deleted the useMPISendForTesting branch January 15, 2025 12:29
This was referenced Jan 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants