Provide some standard mechanism for splitting a stream into lines, and other basic protocol tasks

Most networking libraries provide some standard way to implement basic protocol building blocks like "split a stream into lines", "read exactly N bytes", or "split a stream into length-prefixed frames", e.g.:

* asyncio [`StreamReader.readline`](https://docs.python.org/3/library/asyncio-stream.html#asyncio.StreamReader.readline), [`StreamReader.readexactly`](https://docs.python.org/3/library/asyncio-stream.html#asyncio.StreamReader.readexactly), [`StreamReader.readuntil`](https://docs.python.org/3/library/asyncio-stream.html#asyncio.StreamReader.readuntil)

* The classes in [`twisted.protocols.basic`](https://twistedmatrix.com/documents/current/api/twisted.protocols.basic.html)

* The stdlib socket module's [`makefile` method](https://docs.python.org/3/library/socket.html#socket.socket.makefile), that lets you get access to the full Python file API, including `readline` and friends

* Tornado `IOStream`'s [`read_until`](https://www.tornadoweb.org/en/stable/iostream.html#tornado.iostream.BaseIOStream.read_until)

We don't have anything like this currently, as I was reminded by [this StackOverflow question](https://stackoverflow.com/questions/53575979/how-can-i-read-one-line-at-a-time-from-a-trio-receivestream) from @basak.

**Note: if you're just looking for a quick way to read lines from a trio Stream, then click on that SO link, it has an example.**

# Use cases

* Simple protocols used in tutorials, to make it easy for beginners to get something working
* A base for implementing substantial/standard protocols that happen to use one of these framing methods. This mostly applies to the line-based framing, e.g. twisted's [`LineReceiver`](https://twistedmatrix.com/documents/current/api/twisted.protocols.basic.LineReceiver.html) and [`LineOnlyReceiver`](https://twistedmatrix.com/documents/current/api/twisted.protocols.basic.LineOnlyReceiver.html) have subclasses implementing HTTP, IMAP, POP3, SMTP, Ident, Finger, FTP, Memcache, IRC, ... you get the idea.
* Inventing little private mini-protocols where you don't want to have to build basic framing from scratch. I think a lot of cases that used to use this kind of thing nowadays use HTTP or WebSocket or ZeroMQ, but it still comes up occasionally. This mostly involves the length-prefixed framing variants (e.g. [twisted AMP](https://twistedmatrix.com/documents/current/api/twisted.protocols.amp.BinaryBoxProtocol.html) subclasses `Int16Receiver`), though sometimes it involves lines, e.g. newline-terminated JSON, or the log parser in [linehaul](https://github.com/pypa/linehaul).
* If you have to script an interactive subprocess that was never designed to be scripted, then `readline` and `read_until` are pretty useful. This particular case can also benefit from more sophisticated tools, like TTY emulation and [pexpect](https://pexpect.readthedocs.io/en/stable/)-style pattern matching.

# Considerations

Our approach shouldn't involve adding new methods to `Stream`, because the point of the `Stream` interface is to allow for lots of different implementions, and we don't want to force everyone who implements `Stream` to have to reimplement their own version of the standard frame-splitting algorithms. So this should be some helper function that acts on a `Stream`, or wrapper class that has-a `Stream`, something like that.

For "real" protocols like HTTP, you definitely *can* implement them on top of explicit (async) blocking I/O operations like `readline` and `read_exactly`, but these days I'm pretty convinced that you will be happier using [Sans I/O](https://sans-io.readthedocs.io/). Some of the arguments for sans-io design are kind of pure and theoretical, like "better modularity" and "higher reusability", but having done this twice now (with h11 and wsproto), I really don't feel like it's an eat-your-vegetables thing – the benefits are super practical: like, you can actually understand your protocol code, and test it, and people with totally different use cases show up to fix bugs for you. It's just a more pleasant way to do things.

OTOH, while trio is generally kind of opinionated and we should give confused users helpful nudges in the best direction we can, we don't want to be elitist. If someone's used to hacking together simple protocols using `readline`, and is comfortable doing that, we don't want to put up barriers to their using trio. And if the sans-i/O approach is harder to get started with, then for some people that will legitimately outweigh the long-term benefits.

There might be one way to have our cake and eat it to: if we can *make* the sans-I/O version so simple and easy to get started with that even beginners and folks used to `readline` don't find it a barrier. *If* we can pull this off, it'd be pretty sweet, because then we can teach the better approach from the beginning, and when they move on to implementing more complex protocols, or integrated existing libraries like h11/h2/wsproto, they're already prepared to do it right.

Alternatively, if we can't... there is really not a lot of harm in having a `lines_from_stream` generator, or whatever. But anything more than that is going to require exposing some kind of buffering to the user, which is the core of the sans-I/O pattern, so let's think about sans-I/O for a bit.

# Can we make sans-I/O accessible and easy?

The core parts of implementing a high-quality streaming line reader, a streaming length-prefixed string reader, or an HTTP parser, are actually all kind of the same:

* You need a buffer
* It needs an efficient append-to-the-end operation
* It needs an efficient extract-from-the-beginning operation
* You need to be able to scan the buffer for a delimiter, with some cleverness to track how far you've scanned to avoid O(n^2) rescans after new data is added
* And some kind of maximum buffer size to avoid memory DoS

h11 internally has a [robust implementation](https://github.com/python-hyper/h11/blob/master/h11/_receivebuffer.py) of everything here except for specifying delimiters as a regex, and I need to add that anyway to fix https://github.com/python-hyper/h11/issues/7. So I have a plan already to pull that out into a [standalone library](https://github.com/njsmith/sansio_toolbelt).

And the API to a sans-I/O line reader, length-prefixed string reader, HTTP parser, or websocket parser for that matter, are also all kind of the same: you wrap them around a `Stream`, and then call a `receive` method which tries to pull some "event" out of the internal buffer, while refiling the buffer as necessary.

In fact, if you had sans-I/O versions of any of these, that all followed the same interface conventions, you could even have a single generic wrapper that binds them to a Trio stream, and implements the `ReceiveChannel` interface! Where the objects being received are lines, or `h11.Event` objects, or whatever.

So if you really just wanted a way to receive and send lines on a `Stream`, that might be:

```python
line_channel: trio.abc.Channel[bytes] = sansio_toolbelt.to_trio(sansio_toolbelt.LineProtocol(delimiter=b"\r\n", max_line_length=16384), my_stream)

await line_channel.send(b"hello")
response = await line_channel.receive()
```

That's maybe a *little* bit more complicated than I'd want to use in a tutorial, but it's pretty close? Maybe we can slim it down a little more?

This approach is also flexible enough to handle more complex cases, like protocols that switch between lines-oriented and bulk data (HTTP), or that enable TLS half-way through (SMTP's STARTTLS command), which in Twisted's `LineReceiver` requires some special hooks. You can detach the sans-I/O wrapper from the underlying stream and then wrap it again in a different protocol, so long as you have some way to hand-off the buffer between them.

But while it is flexible enough for that, and that approach is very elegant for Serious Robust Protocol implementations, it might be a lot to ask when someone really just wants to call `readline` twice and then read N bytes, or something like that. So maybe we'd also want something that wraps a `ReceiveStream` and provides `read_line`, `read_exactly`, `read_until`, based on the same buffering code described above but without the fancy sans-I/O event layer in between?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Provide some standard mechanism for splitting a stream into lines, and other basic protocol tasks #796

Use cases

Considerations

Can we make sans-I/O accessible and easy?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Provide some standard mechanism for splitting a stream into lines, and other basic protocol tasks #796

Description

Use cases

Considerations

Can we make sans-I/O accessible and easy?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions