-
-
Notifications
You must be signed in to change notification settings - Fork 392
Description
Most networking libraries provide some standard way to implement basic protocol building blocks like "split a stream into lines", "read exactly N bytes", or "split a stream into length-prefixed frames", e.g.:
-
asyncio
StreamReader.readline,StreamReader.readexactly,StreamReader.readuntil -
The classes in
twisted.protocols.basic -
The stdlib socket module's
makefilemethod, that lets you get access to the full Python file API, includingreadlineand friends -
Tornado
IOStream'sread_until
We don't have anything like this currently, as I was reminded by this StackOverflow question from @basak.
Note: if you're just looking for a quick way to read lines from a trio Stream, then click on that SO link, it has an example.
Use cases
- Simple protocols used in tutorials, to make it easy for beginners to get something working
- A base for implementing substantial/standard protocols that happen to use one of these framing methods. This mostly applies to the line-based framing, e.g. twisted's
LineReceiverandLineOnlyReceiverhave subclasses implementing HTTP, IMAP, POP3, SMTP, Ident, Finger, FTP, Memcache, IRC, ... you get the idea. - Inventing little private mini-protocols where you don't want to have to build basic framing from scratch. I think a lot of cases that used to use this kind of thing nowadays use HTTP or WebSocket or ZeroMQ, but it still comes up occasionally. This mostly involves the length-prefixed framing variants (e.g. twisted AMP subclasses
Int16Receiver), though sometimes it involves lines, e.g. newline-terminated JSON, or the log parser in linehaul. - If you have to script an interactive subprocess that was never designed to be scripted, then
readlineandread_untilare pretty useful. This particular case can also benefit from more sophisticated tools, like TTY emulation and pexpect-style pattern matching.
Considerations
Our approach shouldn't involve adding new methods to Stream, because the point of the Stream interface is to allow for lots of different implementions, and we don't want to force everyone who implements Stream to have to reimplement their own version of the standard frame-splitting algorithms. So this should be some helper function that acts on a Stream, or wrapper class that has-a Stream, something like that.
For "real" protocols like HTTP, you definitely can implement them on top of explicit (async) blocking I/O operations like readline and read_exactly, but these days I'm pretty convinced that you will be happier using Sans I/O. Some of the arguments for sans-io design are kind of pure and theoretical, like "better modularity" and "higher reusability", but having done this twice now (with h11 and wsproto), I really don't feel like it's an eat-your-vegetables thing – the benefits are super practical: like, you can actually understand your protocol code, and test it, and people with totally different use cases show up to fix bugs for you. It's just a more pleasant way to do things.
OTOH, while trio is generally kind of opinionated and we should give confused users helpful nudges in the best direction we can, we don't want to be elitist. If someone's used to hacking together simple protocols using readline, and is comfortable doing that, we don't want to put up barriers to their using trio. And if the sans-i/O approach is harder to get started with, then for some people that will legitimately outweigh the long-term benefits.
There might be one way to have our cake and eat it to: if we can make the sans-I/O version so simple and easy to get started with that even beginners and folks used to readline don't find it a barrier. If we can pull this off, it'd be pretty sweet, because then we can teach the better approach from the beginning, and when they move on to implementing more complex protocols, or integrated existing libraries like h11/h2/wsproto, they're already prepared to do it right.
Alternatively, if we can't... there is really not a lot of harm in having a lines_from_stream generator, or whatever. But anything more than that is going to require exposing some kind of buffering to the user, which is the core of the sans-I/O pattern, so let's think about sans-I/O for a bit.
Can we make sans-I/O accessible and easy?
The core parts of implementing a high-quality streaming line reader, a streaming length-prefixed string reader, or an HTTP parser, are actually all kind of the same:
- You need a buffer
- It needs an efficient append-to-the-end operation
- It needs an efficient extract-from-the-beginning operation
- You need to be able to scan the buffer for a delimiter, with some cleverness to track how far you've scanned to avoid O(n^2) rescans after new data is added
- And some kind of maximum buffer size to avoid memory DoS
h11 internally has a robust implementation of everything here except for specifying delimiters as a regex, and I need to add that anyway to fix python-hyper/h11#7. So I have a plan already to pull that out into a standalone library.
And the API to a sans-I/O line reader, length-prefixed string reader, HTTP parser, or websocket parser for that matter, are also all kind of the same: you wrap them around a Stream, and then call a receive method which tries to pull some "event" out of the internal buffer, while refiling the buffer as necessary.
In fact, if you had sans-I/O versions of any of these, that all followed the same interface conventions, you could even have a single generic wrapper that binds them to a Trio stream, and implements the ReceiveChannel interface! Where the objects being received are lines, or h11.Event objects, or whatever.
So if you really just wanted a way to receive and send lines on a Stream, that might be:
line_channel: trio.abc.Channel[bytes] = sansio_toolbelt.to_trio(sansio_toolbelt.LineProtocol(delimiter=b"\r\n", max_line_length=16384), my_stream)
await line_channel.send(b"hello")
response = await line_channel.receive()That's maybe a little bit more complicated than I'd want to use in a tutorial, but it's pretty close? Maybe we can slim it down a little more?
This approach is also flexible enough to handle more complex cases, like protocols that switch between lines-oriented and bulk data (HTTP), or that enable TLS half-way through (SMTP's STARTTLS command), which in Twisted's LineReceiver requires some special hooks. You can detach the sans-I/O wrapper from the underlying stream and then wrap it again in a different protocol, so long as you have some way to hand-off the buffer between them.
But while it is flexible enough for that, and that approach is very elegant for Serious Robust Protocol implementations, it might be a lot to ask when someone really just wants to call readline twice and then read N bytes, or something like that. So maybe we'd also want something that wraps a ReceiveStream and provides read_line, read_exactly, read_until, based on the same buffering code described above but without the fancy sans-I/O event layer in between?