What's the deal with partial cancellation

Recently we were discussing what happens if a subprocess call gets cancelled, but you want to at least find out what the subprocess said *before* it got killed. And @oremanj [wrote](https://github.com/python-trio/trio/pull/833#issuecomment-453874834):

> I appreciate the simplicity of Trio's stance that "a cancelled operation didn't happen", but it doesn't necessarily compose very well -- if an operation is built out of multiple other underlying operations that can't readily be rolled back, either the "cancelled = didn't happen" rule has to break or the entire higher-level operation has to be uncancellable once started. I don't think we want to propose the latter, so maybe we should think about a user-friendly way to talk about the circumstances in which the rule gets bent?

It's a fair point! The "cancelled operation didn't happen" thing was only ever supposed to apply to low-level, primitive operations. In that context, it's a pretty important rule, because without it you can't ever hope to build anything sensible on top. But it's never made any sense for higher-level operations (i.e., the ones that working programmers are actually interacting with 99.99% of the time). Of course, at the time the initial docs were being written, I was struggling to figure out how to get the primitive operations to work at all and there were no higher-level operations. So that rule probably gets more prominence then it should :-). But things have changed and we should have a better story here.

Recently in a discussion of how to talk about cancellation in the docs, @smurfix wrote:

>The result of cancelling something is either (a) the called code didn't do anything, raising a Cancelled exception, or (b) the called code did what it was supposed to do, returning its result normally. Of course there's also the possibility of (c) the called code got part way through and left whatever it tried to accomplish in an inconsistent state.
>
>It's probably out of Trio's scope to signal that state to the caller; there should be an attribute "is this object still usable", and/or the object should raise an `InconsistentStateError` when it's used again. We might want to document that as best practice, and maybe add that exception to Trio as a sensible default for trio-using libraries to raise.

So that's one idea for how Trio could provide concrete advice to users about how to work with partial cancellation.

I don't have any organized thoughts here, so I'm just going to dump a bunch of unorganized ones.

-----

There were two concrete proposals that @oremanj made in the subprocess discussion (unless there were more and I'm forgetting some :-)):

* Add `timeout` and `deadline` arguments to `trio.run_process`. These would have a similar effect to wrapping a cancel scope around `run_process`, *except* that if the timeout expires, then `run_process` wouldn't raise `Cancelled`, it would raise `CalledProcessError`, which would be a special exception with attributes recording whatever partial output, return code, etc., we got from the process.

  The downside of this is that it's extremely specific to subprocesses, which feels weird. The problem is really "what do you do if an operation times out and you want partial results?" – I actually have no idea what makes subprocesses special here, as compared to, I don't know, calling some docker API or something. So a solution that's specific to subprocesses doesn't feel natural. OTOH it would work, and maybe there's some reason that people need partial results from subprocesses a lot, and don't in other cases, so something simple and specific is fine.

* Give `run_process` a special (optional) semantics, where if while running it say a `Cancelled` exception materialize, it would automatically replace it with `CalledProcessError`.

  This is a really intriguing idea, but makes me uncomfortable because we have no idea where that `Cancelled` is coming from – in particular, we don't know whether the code that was going to process the partial results is also cancelled, or not.

I don't actually know why @oremanj is so eager to get at partial results in this case; I gather he has some use case where he needs this feature, but I don't know what it is.

----

Another notorious example where cancellation loses information in an important way is `Stream.send_all`. Right now, if `send_all` gets cancelled, you effectively have to throw away that stream and give up, because you have no idea what data you have or haven't sent.

It wasn't always like this: originally, if `send_all` was cancelled, there was a hack where we'd attach an attribute to the `Cancelled` exception recording how many bytes we'd sent, and a sufficiently clever caller could potentially use that to reconstruct the state of the stream.

Then I added `SSLStream` and it quickly became clear that this design was no good. There are two major issues:

1. exceptions may start out in some nice well-defined operation like `SocketStream.send_all`, but they propagate. That's what exceptions do! Right across abstraction boundaries. So, for example, if you called `SSLStream.send_all`, and it called `SocketStream.send_all`, then if you weren't careful then you could get an exception out of `SSLStream.send_all` that has metadata attached saying how many bytes `SocketStream.send_all` sent, which is catastrophically misleading.

2. `SSLStream` actually has some pretty complicated internal state, because, well, you know. Cryptography. In particular, cancellation is very different: with something like `SocketStream`, if `send_all` is cancelled in the middle, that's pretty simple: you sent the first N bytes, but not the rest. With `SSLStream`, though, then `send_all` immediately *commits* to sending *all* the bytes, before it sends *any* of them. So if it gets cancelled, then we're in this weird state where it's sent some of the bytes, but it's committed to sending the rest of the bytes, but it hasn't yet. Oh, and we don't even know how many user-level bytes have actually been transmitted in a way that the other side can read them. (Like, we might know sent 500 bytes on the underlying socket, but maybe 100 of those are protocol framing, and then the last 50 are actual application data *but* it's application data that the other side can't decrypt until we send another 50 bytes to complete that frame, ... it's really messy.) There just is no useful way to communicate the state of an `SSLStream` after `send_all` is cancelled, no matter what metadata we attach to what exceptions.

So, instead, we've been going ahead with the rule that once a `send_all` is cancelled, your stream is doomed. We *haven't* done anything to detect this and e.g. raise an error if you try calling `send_all` again after a cancelled `send_all`, like in @smurfix's suggestion.... maybe we should?

And then as a consequence, for downstream users, like trio-websocket, what we've been converging on is basically the rule that only one task should "own" a `Stream` for sending at a time – if you want to a stream to survive sending from multiple tasks, then you create a background task that handles the `send_all` calls, and the other tasks send stuff to that task over some kind of channel. As @mehaase recently pointed out in https://github.com/python-trio/trio/issues/328#issuecomment-457643176, we might want to start documenting this more thoroughly? (#328 is generally relevant to these issues – it's ostensibly about `send_all` and locking, but really it's about sharing a stream between multiple tasks, and cancellation turns out to be a major consideration there.)

This does seem to be working out pretty well. So I guess the moral is that at least in this area, "partial results" just aren't an important case to think about. All the cases we care about are either "leaves the state inconsistent" or "atomic", and you can build the latter on top of the former (!) by using a background task + a channel, b/c the *channel's* `send` operation is atomic.

----

Some of this comment also feels relevant, especially the bit about "what does cancellation mean" near the end: https://github.com/python-trio/trio/issues/147#issuecomment-453561373

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What's the deal with partial cancellation #889

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

What's the deal with partial cancellation #889

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions