@glyph gave a great talk at PyCon this year that involved using a virtual (= in memory, in python) networking layer to build a virtual server to test a real client.
As far as the virtual networking part goes, we have some of this, e.g. #107 has some pretty solid in-memory implementations of the stream abstraction. But it would be neat to virtualize more of networking, e.g. so in a test I can have tell my real server code to listen on some-server.example.org:12345 and tell my real client code to connect to that and they magically get an in-memory connection between them.
Fixing #159 would reduce the amount of monkeypatching needed to do this, but OTOH I guess monkeypatching the whole trio.socket module is probably the simplest and most direct way to do this anyway... or we could hook in at the socket layer (have it check a special flag before allocating a new socket) or at the high-level networking layer (open_tcp_stream checks a special flag and then returns a FakeSocketStream etc.). Fundamentally there's going to be some global state because no-one will put up with passing around the whole library interface as an argument everywhere, literally every async library has some kind of contextual/global state they use to solve this problem, and I can't think why it would matter a huge amount whether that's from twisted.internet import reactor vs asyncio._get_running_loop() vs trio.socket.socket(). So I'm leaning towards not worrying about monkeypatching. (The one practical issue I can think of is if someone is trying to use trio in two threads simultaneously, then this will cause some problems because the monkeypatch would be global, not thread-local. Maybe we can make it thread-local somehow? Or maybe we just don't care, because there really isn't any good reason to run your test suite multi-threaded in Python.)
Oh, or here's a horrible wonderful idea: embed the fake network into the regular network namespace, so like if you try to bind to 257.1.1.1 or example.trio-fake-tld then the regular functions notice and return faked results (we could even encode test parameters into the name, like getaddrinfo("example.ipv6.trio-fake-tld") returns fake ipv6 addresses...). Of course this would be a bit of a problem for code that wants to like, use the ipaddress library to parse getaddrinfo results. There are the reserved ip address ranges, but that gets dicey because they should give errors in normal use... In practice the solution might be to stick to mostly intercepting things at the hostname level (e.g. open_tcp_stream doesn't even need to resolve anything when it sees a fake hostname), though we do need to have some answer when the user asks for getpeername. I guess we could treat all addresses as regular until someone invokes this functionality with a hostname, at which point some ip addresses become magical.
BUT there would also still very much need to be a magic flag to make sure all this is opt-in at the run loop level, to make sure it could never be accidentally or maliciously invoked in real code, to avoid potential security bugs. At which point I suppose that magic flag could just make all hostnames/addresses magical. Oh well, I said it was a horrible (wonderful) idea :-). The bit about having hostnames determine host properties might still be a good idea.
There's also a big open question about how closely this API should mimic a real network. At the very least it would have to provide the interfaces to do things like set TCP_NODELAY (even as a no-op), for compatibility with code made to run on a real network. But there are also more subtle issues, like, should we simulate the large-but-finite buffers that real sockets have? Our existing in-memory stream implementations have either infinite buffering or zero buffering, both of which are often useful for testing, but neither of which is a great match to how networks actually work... and of course there are also all the usual questions about what's kind of API to provide for manipulating the virtual network within a test.
I suspect that this is a big enough problem and with enough domain-specific open questions that this should be a separate special-purpose library? Though I guess if we want to hook the regular functions without monkeypatching then there will need to be some core API for that.
Prerequisite: We'll need run- or task-local storage (#2) to store the state of the virtual network.
@glyph gave a great talk at PyCon this year that involved using a virtual (= in memory, in python) networking layer to build a virtual server to test a real client.
As far as the virtual networking part goes, we have some of this, e.g. #107 has some pretty solid in-memory implementations of the stream abstraction. But it would be neat to virtualize more of networking, e.g. so in a test I can have tell my real server code to listen on some-server.example.org:12345 and tell my real client code to connect to that and they magically get an in-memory connection between them.
Fixing #159 would reduce the amount of monkeypatching needed to do this, but OTOH I guess monkeypatching the whole
trio.socketmodule is probably the simplest and most direct way to do this anyway... or we could hook in at the socket layer (have it check a special flag before allocating a new socket) or at the high-level networking layer (open_tcp_streamchecks a special flag and then returns aFakeSocketStreametc.). Fundamentally there's going to be some global state because no-one will put up with passing around the whole library interface as an argument everywhere, literally every async library has some kind of contextual/global state they use to solve this problem, and I can't think why it would matter a huge amount whether that'sfrom twisted.internet import reactorvsasyncio._get_running_loop()vstrio.socket.socket(). So I'm leaning towards not worrying about monkeypatching. (The one practical issue I can think of is if someone is trying to use trio in two threads simultaneously, then this will cause some problems because the monkeypatch would be global, not thread-local. Maybe we can make it thread-local somehow? Or maybe we just don't care, because there really isn't any good reason to run your test suite multi-threaded in Python.)Oh, or here's a horrible wonderful idea: embed the fake network into the regular network namespace, so like if you try to bind to
257.1.1.1orexample.trio-fake-tldthen the regular functions notice and return faked results (we could even encode test parameters into the name, likegetaddrinfo("example.ipv6.trio-fake-tld")returns fake ipv6 addresses...). Of course this would be a bit of a problem for code that wants to like, use the ipaddress library to parsegetaddrinforesults. There are the reserved ip address ranges, but that gets dicey because they should give errors in normal use... In practice the solution might be to stick to mostly intercepting things at the hostname level (e.g.open_tcp_streamdoesn't even need to resolve anything when it sees a fake hostname), though we do need to have some answer when the user asks forgetpeername. I guess we could treat all addresses as regular until someone invokes this functionality with a hostname, at which point some ip addresses become magical.BUT there would also still very much need to be a magic flag to make sure all this is opt-in at the
runloop level, to make sure it could never be accidentally or maliciously invoked in real code, to avoid potential security bugs. At which point I suppose that magic flag could just make all hostnames/addresses magical. Oh well, I said it was a horrible (wonderful) idea :-). The bit about having hostnames determine host properties might still be a good idea.There's also a big open question about how closely this API should mimic a real network. At the very least it would have to provide the interfaces to do things like set
TCP_NODELAY(even as a no-op), for compatibility with code made to run on a real network. But there are also more subtle issues, like, should we simulate the large-but-finite buffers that real sockets have? Our existing in-memory stream implementations have either infinite buffering or zero buffering, both of which are often useful for testing, but neither of which is a great match to how networks actually work... and of course there are also all the usual questions about what's kind of API to provide for manipulating the virtual network within a test.I suspect that this is a big enough problem and with enough domain-specific open questions that this should be a separate special-purpose library? Though I guess if we want to hook the regular functions without monkeypatching then there will need to be some core API for that.
Prerequisite: We'll need run- or task-local storage (#2) to store the state of the virtual network.