Document why SendStream.send_all doesn't automatically use a Lock to serialize calls

[Original title: Should SendStream automatically serialize calls to send_all?]

(Context: discussion with @agronholm at https://gitter.im/python-trio/general?at=59c03f58c101bc4e3ae28c81)

Currently, if two tasks call `SendStream.send_all` at the same time, then the second one raises an error, and if you want to write to the same stream from multiple tasks then you have to protect the stream with a `Lock`. We could alternatively require that stream implementations make this "just work", basically moving the lock inside `send_all`.

Changing this wouldn't be a compat breaker for the users of the `Stream` API, because it would convert an error into something that works. But it would be a compat breaker for implementors of the `Stream` API, because now their users might start depending on this new behavior. So we should probably make a decision sooner rather than later.

# receive_some

For `receive_some`, I don't think this makes sense; if you have multiple tasks trying to read from the same stream then you generally need to rethink things. (How do you make sure that the correct bytes go to the correct task? There's no easy answer.) Of course, it's also hard to make `receive_some` actually broken even if we did allow concurrent calls – there's no equivalent to "interleaving" like can happen with `send_all`. But it will raise errors sometimes if there's no explicit locking, because `trio.hazmat.wait_readable` raises errors if two tasks try to block in it at the same time. ...I guess really this is exactly the same cases where it raises an error now with the explicit conflict detection though, give or take a `checkpoint`.

# send_all - is it possible?

For one task calling `send_all` and another calling `wait_send_all_might_not_block`, or two tasks calling `wait_send_all_might_not_block`... bleh. Giving an error is fairly reasonable, but maybe we can do better. If we allow two tasks to concurrently do:
```python
while True:
    await stream.send_all(...)
```
then we should probably also allow two tasks to concurrently do:
```python
while True:
    await stream.wait_send_all_might_not_block()
    await stream.send_all(...)
```
which would mean supporting that all combinations of `send_all` and `wait_send_all_might_not_block`.

What if we simply protected both methods with the same lock?

- `wait_send_all_might_not_block` is holding the lock, `send_all` arrives: `send_all` ends up blocking until `wait_send_all_might_not_block` returns. Ok, sure, by definition this was going to happen anyway. The `send_all` might mean that whoever called `wait_send_all_might_not_block` is surprised because it does block, but that's part of the contract anyway (hence the word `might`)

- `send_all` is holding the lock, `wait_send_all_might_not_block` arrives: an immediate call to `send_all` would block until it got the lock, so `wait_send_all_might_not_block` *should* block until the lock is available. OK.

- `wait_send_all_might_not_block` is holding the lock, `wait_send_all_might_not_block` arrives: this is a little weird, but I guess it works out ok. The second one can't proceed until the first one returns. But we know that the first one will return as soon as `send_all` might not block, so this can't directly cause the second one to block longer than it should have. And then when it finally gets the lock, it should detect the same stream state that the first one did, and return immediately. (Unless something else happened in the mean time to change the state, but in that case blocking longer is appropriate anyway.)

# is it a good idea?

So, it looks like we could potentially change this. Is it a good idea?

In theory a `trio.Lock` is a little more heavyweight than a `trio._util.ConflictDetector`, but the difference is pretty minor. Mostly the `Lock` needs a bit more memory to hold the `ParkingLot` used when there's contention; a `ParkingLot` is just an object holding an `OrderedDict`. `OrderedDict` is bigger than you'd think (`sys.getsizeof(OrderedDict()) == 416` on my laptop), but we could allocate it lazily if it really became an issue, and `Stream`s are somewhat substantial objects anyway (they generally hold kernel buffers, etc.). And acquiring/releasing an uncontended `Lock` is barely any more expensive than acquiring/releasing a `ConflictDetector`. One difference is that blocking to get a `Lock` requires async context, but for `send_all` and friends this is fine.

A bigger question is whether this actually gives more friendly semantics. It's certainly convenient for message-based protocols like websocket, where you might want to have a `WSConnection` object where you do `await conn.send_message("...")` which packs that string into a websocket frame and passes the frame into the underlying byte-stream in a single call -- right now this requires explicit locking if you want `send_message` to be usable from different tasks, and if `send_all` did its own locking then *potentially* it wouldn't. Specifically, it would work OK if you make sure to write `send_message` so that all the internal protocol state manipulation happens synchronously, and then the `send_all` call happens at the end. This is probably the most natural way to write this (especially if working on top of a sans-io library like wsproto), but it's certainly possible to get it wrong if you don't pay attention. OTOH if streams require users to do their own locking, then the natural way to do this locking is to put it around the *whole* `send_message` body, and then you don't have to worry about auditing `send_message` to make sure that it contains exactly one checkpoint.

There are also ways of working with streams that are inherently task-unsafe regardless of what kind of implicit locking we do. E.g. still with the websocket example, someone could write:
```python
async def send_message(self, body):
    await self.transport.send_all(self._make_frame_header(body))
    await self.transport.send_all(self._make_frame_body(body))
```
Now this method is definitely not safe to call concurrently from multiple tasks. If you do it anyway, then with the current design, it may or may not raise an error and point out the problem; with implicit locking, it definitely never raises an error. So arguably the current design does better here?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Document why SendStream.send_all doesn't automatically use a Lock to serialize calls #328

receive_some

send_all - is it possible?

is it a good idea?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Document why SendStream.send_all doesn't automatically use a Lock to serialize calls #328

Description

receive_some

send_all - is it possible?

is it a good idea?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions