Skip to content

Conversation

@csegarragonz
Copy link
Collaborator

@csegarragonz csegarragonz commented Jun 23, 2021

After #118, dequeuing has a timeout of 500 ms. Unfortunately, there is an MPI test in faasm (the barrier test) that blocks for 1s dequeuing (note that the 1s timeout is enforced through sleep calls). Thus we either:

  • Make the timeout in faabric larger (in this PR, happy to revert)
  • Have a MPI_QUEUE_TIMEOUT (originally opted for this, but given how controversial timeout values are, I doubt adding more will solve anything)
  • Change the test in clients/cpp

In favour of changing the timeout value here is that there may be occasions in which we deliberately block on a send/recv as a fake synchronization point, hence 500 ms may be a bit short.

In the process of debugging this issue, I've realised we weren't testing the barrier at all in faabric so added a couple tests for that.

Lastly, after this gets merged in I will bump the faabric dependency in faasm. It's cumbersome to do so often, but helps catching this sort of bugs early.

@csegarragonz csegarragonz self-assigned this Jun 23, 2021
@csegarragonz csegarragonz requested a review from Shillaker June 23, 2021 08:04
@csegarragonz csegarragonz added mpi Related to the MPI implementation bug Something isn't working labels Jun 23, 2021
@csegarragonz csegarragonz merged commit 608967e into master Jun 23, 2021
@csegarragonz csegarragonz deleted the barrier branch June 23, 2021 09:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working mpi Related to the MPI implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants