test MPI GPU awareness using Send/Recv instead of MPI_Allreduce #310

glesur · 2025-01-15T11:15:35Z

For some obscure reason, MPI_AllReduce with openmpi is broken when using cuda arrays on OpenMPI 4.1.5. This led idefix to think that it is not running a cuda-aware library on JeanZay with H100 GPUs (for which only openmpi 4.1.5 is available).

This issue might be related to open-mpi/ompi#9845

Since Idefix does not use reduction operations on GPUs, but only Send/Recv, this PR patch this problem by testing the MPI library with Send/Recv instead of AllReduce to detect non-Cuda Aware libraries.

test MPI using Send/Recv instead of reduction

4ab8212

glesur added the bug Something isn't working label Jan 15, 2025

glesur merged commit 7b2910c into develop Jan 15, 2025
38 checks passed

glesur deleted the useMPISendForTesting branch January 15, 2025 12:29

This was referenced Jan 15, 2025

V2.2.00 #311

Merged

V2.2.00 #313

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test MPI GPU awareness using Send/Recv instead of MPI_Allreduce #310

test MPI GPU awareness using Send/Recv instead of MPI_Allreduce #310

Uh oh!

glesur commented Jan 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

test MPI GPU awareness using Send/Recv instead of MPI_Allreduce #310

test MPI GPU awareness using Send/Recv instead of MPI_Allreduce #310

Uh oh!

Conversation

glesur commented Jan 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants