-
Notifications
You must be signed in to change notification settings - Fork 25
Open
Description
Sample CI failure:
https://gitlab.tiker.net/inducer/meshmode/-/jobs/533461
Similar failure in grudge:
https://gitlab.tiker.net/inducer/grudge/-/jobs/533485
Sample traceback:
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/var/lib/gitlab-runner/builds/VFPjm48d/0/inducer/meshmode/.env/lib/python3.11/site-packages/mpi4py/run.py", line 208, in <module>
main()
File "/var/lib/gitlab-runner/builds/VFPjm48d/0/inducer/meshmode/.env/lib/python3.11/site-packages/mpi4py/run.py", line 198, in main
run_command_line(args)
File "/var/lib/gitlab-runner/builds/VFPjm48d/0/inducer/meshmode/.env/lib/python3.11/site-packages/mpi4py/run.py", line 47, in run_command_line
run_path(sys.argv[0], run_name='__main__')
File "<frozen runpy>", line 291, in run_path
File "<frozen runpy>", line 98, in _run_module_code
File "<frozen runpy>", line 88, in _run_code
File "/var/lib/gitlab-runner/builds/VFPjm48d/0/inducer/meshmode/test/test_partition.py", line 609, in <module>
_test_mpi_boundary_swap(dim, order, num_groups)
File "/var/lib/gitlab-runner/builds/VFPjm48d/0/inducer/meshmode/test/test_partition.py", line 426, in _test_mpi_boundary_swap
conns = bdry_setup_helper.complete_some()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/var/lib/gitlab-runner/builds/VFPjm48d/0/inducer/meshmode/meshmode/distributed.py", line 332, in complete_some
data = [self._internal_mpi_comm.recv(status=status)]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "mpi4py/MPI/Comm.pyx", line 1438, in mpi4py.MPI.Comm.recv
File "mpi4py/MPI/msgpickle.pxi", line 341, in mpi4py.MPI.PyMPI_recv
File "mpi4py/MPI/msgpickle.pxi", line 303, in mpi4py.MPI.PyMPI_recv_match
mpi4py.MPI.Exception: MPI_ERR_OTHER: known error not in list
Downgrading to libfabric (see here) appears to resolve this.
This is the code in mpi4py that ultimately fails, it's a matched receive (mrecv).
@majosm Got any ideas? (Pinging you since the two of us last touched this code.)
Metadata
Metadata
Assignees
Labels
No labels