Skip to content

Conversation

@graingert
Copy link
Member

refs #4133, #6548

  • Tests added / passed
  • Passes pre-commit run --all-files

@github-actions
Copy link
Contributor

github-actions bot commented Jun 14, 2022

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

       15 files  ±  0         15 suites  ±0   6h 37m 40s ⏱️ + 26m 40s
  2 875 tests +10    2 789 ✔️ +  6    82 💤 +  2  3 +2  1 🔥 ±0 
21 295 runs  +70  20 341 ✔️ +55  950 💤 +13  3 +2  1 🔥 ±0 

For more details on these failures and errors, see this check.

Results for commit b20932d. ± Comparison against base commit 344868a.

♻️ This comment has been updated with latest results.

Copy link
Member

@fjetter fjetter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes a lot of sense. Do you see an easy way to test this? I think I wouldn't want to go through a sophisticated mocking/faking exercise just to test this exception handler, though.

FWIW we introduced this abort in #4239 and added test_comm_closed_on_buffer_error to test this which does patch the stream write. Better than nothing?

@fjetter
Copy link
Member

fjetter commented Jun 14, 2022

maybe you're on to something :)
image

@graingert
Copy link
Member Author

You can do an asyncio.wait_for(comm.read(...), 0.05) on a comm pair. It's just writing a test and fixing the other comms is much harder

# was already read from the underlying socket, so it is not even safe to retry
# here using the same stream. The only safe thing to do is to abort.
# (See also GitHub #4133).
except BaseException:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when #4133 was introduced CancelledError was still a subclass of Exception since 3.8 CancelledError is no-longer caught here

@fjetter
Copy link
Member

fjetter commented Jun 14, 2022

You can do an asyncio.wait_for(comm.read(...), 0.05) on a comm pair. It's just writing a test and fixing the other comms is much harder

We may want to fix TCP only at first and delay everything else until it becomes an actual problem (and if possible asyncio_tcp). The other backends are used very rarely. Fixing this for TCP already would be a big win

@graingert graingert marked this pull request as ready for review June 15, 2022 14:51
@fjetter fjetter linked an issue Jun 16, 2022 that may be closed by this pull request
@fjetter
Copy link
Member

fjetter commented Jun 16, 2022

I added a commit that adds a test and changes the exception handling in the write path to handle BaseExceptions as well. I can easily see a KeyboardInterrupt happening during write that is handled upstream, e.g. in a client jupyter session, and we wouldn't want the stream to be corrupted.
That might make it more important to take #5667 on

@fjetter fjetter merged commit b59a322 into dask:main Jun 16, 2022
pentschev added a commit to pentschev/distributed that referenced this pull request Sep 2, 2022
This change will ensure `CancelledError`s are catched upon shutting down
the Dask cluster, which may otherwise raise various errors.

See also dask#6574 .
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Comm objects do not handle cancellation correctly

2 participants