Skip to content

Conversation

@fjetter
Copy link
Member

@fjetter fjetter commented Aug 15, 2023

This is possibly why we've seen spurious CancelledErrors popping up recently.

Python 3.11 changes semantics around cancellation. The docs explicitly state that when we want to suppress cancellation we have to uncancel the task, see https://docs.python.org/3/library/asyncio-task.html#asyncio.Task.uncancel

Some background about this cancellation suppression over here

I don't really know how to test this...

@fjetter fjetter requested a review from graingert August 15, 2023 12:59
@fjetter fjetter changed the title Uncancel task in ConnectionPool Use task.uncancel when suppressing cancellation in ConnectionPool Aug 15, 2023
except asyncio.CancelledError:
current_task = asyncio.current_task()
if sys.version_info >= (3, 11):
current_task.uncancel()
Copy link
Member

@graingert graingert Aug 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't quite the right usage for uncancel. It should only be called if self._reasons.pop(current_task is set and then CancelledError should be re-raised if uncancel() != 0

Copy link
Member

@graingert graingert Aug 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One option is to (on python3.11) use

try:
    async asyncio.timeout(math.inf) as scope:
        await ...
except TimeoutError:
    raise CommClosedError(...)

And then scope.reschedule(-1) to cancel the task, then we can rely on cpython code to catch the CancelledError and call uncancel() correctly

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, but I won't rewrite the entire code for a py3.11 only option

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already started on it here: #8109 😝

Copy link
Member

@graingert graingert Aug 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth it as we can skip a load of bookkeeping for external/internal cancellation on 3.11, and "end-user code" isn't really supposed to call .uncancel():

This method is used by asyncio’s internals and isn’t expected to be used by end-user code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also I was wrong on how to use uncancel() correctly here, You need to record the cancelling state: https://github.com/python/cpython/blob/e2420c5cae7a173c5242b3507979010a933e53a7/Lib/asyncio/timeouts.py#L87 then check https://github.com/python/cpython/blob/3.11/Lib/asyncio/timeouts.py#L108 if self._task.uncancel() <= self._cancelling and exc_type is exceptions.CancelledError:

except asyncio.CancelledError:
current_task = asyncio.current_task()
if sys.version_info >= (3, 11):
current_task.uncancel()
Copy link
Member

@graingert graingert Aug 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One option is to (on python3.11) use

try:
    async asyncio.timeout(math.inf) as scope:
        await ...
except TimeoutError:
    raise CommClosedError(...)

And then scope.reschedule(-1) to cancel the task, then we can rely on cpython code to catch the CancelledError and call uncancel() correctly

@github-actions
Copy link
Contributor

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

       21 files  ±       0         21 suites  ±0   12h 9m 50s ⏱️ + 1h 11m 40s
  3 767 tests  -        2    3 657 ✔️ +       4     107 💤  -   2  3  - 4 
36 427 runs  +1 248  34 621 ✔️ +1 209  1 801 💤 +41  5  - 2 

For more details on these failures, see this check.

Results for commit ddfa14a. ± Comparison against base commit 8aa04a8.

This pull request removes 2 tests.
distributed.cli.tests.test_dask_worker.test_listen_address_ipv6[tcp:..[ ‑ 1]:---nanny]
distributed.cli.tests.test_dask_worker.test_listen_address_ipv6[tcp:..[ ‑ 1]:---no-nanny]

@fjetter
Copy link
Member Author

fjetter commented Aug 15, 2023

FWIW I am not sure if this is even worth doing. I found this when investigating flaky tests and while trying to find out why #8013 was necessary. This cannot explain #8013 since the problems also happen in py3.10 and py3.9

I could trace this all back to #8110 but am still lacking a very thorough explanation or a proper unit test

@fjetter fjetter closed this Aug 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants