diff --git a/docs/mocking-network-outages-in-rust.md b/docs/mocking-network-outages-in-rust.md index 5a8ccfd3..998af047 100644 --- a/docs/mocking-network-outages-in-rust.md +++ b/docs/mocking-network-outages-in-rust.md @@ -2,75 +2,127 @@ ## Introduction to the Problem -Networked servers must be robust against unpredictable conditions – slow or dropped connections, timeouts, partial data transmission, etc. In the `mxd` server, which is an async Rust application, it’s crucial to **simulate network outages** (timeouts, connection resets, partial sends) and verify that the server handles them gracefully. However, inducing such failures reliably is tricky without a controlled test environment. - -In this tutorial, we demonstrate how to refactor and test `mxd`’s server components to **simulate unreliable network conditions**. We’ll introduce a transport abstraction to inject simulated failures, and use `tokio-test::io::Builder` for custom I/O streams. We’ll also leverage `rstest` for parameterized tests and `mockall` for mocking, where appropriate. The result will be a suite of tests ensuring `mxd`’s server remains stable even when the network is not. +Networked servers must be robust against unpredictable conditions – slow or +dropped connections, timeouts, partial data transmission, etc. In the `mxd` +server, which is an async Rust application, it’s crucial to **simulate network +outages** (timeouts, connection resets, partial sends) and verify that the +server handles them gracefully. However, inducing such failures reliably is +tricky without a controlled test environment. + +In this tutorial, we demonstrate how to refactor and test `mxd`’s server +components to **simulate unreliable network conditions**. We’ll introduce a +transport abstraction to inject simulated failures, and use +`tokio-test::io::Builder` for custom I/O streams. We’ll also leverage `rstest` +for parameterized tests and `mockall` for mocking, where appropriate. The result +will be a suite of tests ensuring `mxd`’s server remains stable even when the +network is not. ## Overview of `mxd`’s Server Networking -Before diving into the solution, let’s quickly review how `mxd` currently handles network I/O in its server: +Before diving into the solution, let’s quickly review how `mxd` currently +handles network I/O in its server: -- **Connection Acceptance:** The `main.rs` file defines an asynchronous `accept_connections` loop. It listens on a `TcpListener` and spawns a task to handle each accepted `TcpStream`. If `listener.accept()` fails, it logs an error and continues. +- **Connection Acceptance:** The `main.rs` file defines an asynchronous + `accept_connections` loop. It listens on a `TcpListener` and spawns a task to + handle each accepted `TcpStream`. If `listener.accept()` fails, it logs an + error and continues. -- **Client Handling:** The core connection logic is in `handle_client` (called in each spawned task). This function performs an **initial handshake** with a client, then enters a loop to read and process transactions. Notably, `handle_client`: +- **Client Handling:** The core connection logic is in `handle_client` (called + in each spawned task). This function performs an **initial handshake** with a + client, then enters a loop to read and process transactions. Notably, + `handle_client`: - Uses `tokio::io::split` to split the `TcpStream` into a reader and writer. - - Reads a 12-byte handshake from the client with a **5-second timeout**. If the client doesn’t send the handshake in time, a timeout error code is sent to the client and the connection is closed. + - Reads a 12-byte handshake from the client with a **5-second timeout**. If + the client doesn’t send the handshake in time, a timeout error code is sent + to the client and the connection is closed. - - Validates the handshake bytes using the `protocol` module (checking a “TRTP” protocol ID and version). If invalid, an error code is sent and the connection ends. + - Validates the handshake bytes using the `protocol` module (checking a “TRTP” + protocol ID and version). If invalid, an error code is sent and the + connection ends. - On successful handshake, sends back a handshake OK reply and proceeds. -- **Transaction Loop:** After handshake, `handle_client` creates a `TransactionReader` and `TransactionWriter` (from the `transaction` module) to handle the message framing. It then loops with `tokio::select!`, awaiting either: +- **Transaction Loop:** After handshake, `handle_client` creates a + `TransactionReader` and `TransactionWriter` (from the `transaction` module) to + handle the message framing. It then loops with `tokio::select!`, awaiting + either: - 1. **Incoming Transaction:** `tx_reader.read_transaction()` which reads the next complete request frame (possibly composed of multiple fragments). If a transaction is received, it calls `handler::handle_request` to produce a response and writes the response back with `tx_writer.write_transaction`. + 1. **Incoming Transaction:** `tx_reader.read_transaction()` which reads the + next complete request frame (possibly composed of multiple fragments). If a + transaction is received, it calls `handler::handle_request` to produce a + response and writes the response back with `tx_writer.write_transaction`. - 2. **Shutdown Signal:** A shared shutdown `watch` channel to break the loop on server shutdown. + 2. **Shutdown Signal:** A shared shutdown `watch` channel to break the loop on + server shutdown. The loop’s error handling is important for our tests: - - If `read_transaction` returns an **I/O error** of kind *UnexpectedEof* (meaning the client closed the connection), the loop breaks **without error** (graceful termination). + - If `read_transaction` returns an **I/O error** of kind *UnexpectedEof* + (meaning the client closed the connection), the loop breaks **without + error** (graceful termination). - - If any other error occurs (e.g. a parsing error, timeout, or a non-EOF I/O error), `handle_client` returns an `Err` to indicate a connection error. + - If any other error occurs (e.g. a parsing error, timeout, or a non-EOF I/O + error), `handle_client` returns an `Err` to indicate a connection error. -- **I/O Timeouts:** The `transaction` module imposes a default **5 second I/O timeout** on every read/write operation via `IO_TIMEOUT`. It wraps `AsyncReadExt::read_exact` and `AsyncWriteExt::write_all` calls in `tokio::time::timeout(...)`. For example, reading a frame header uses: +- **I/O Timeouts:** The `transaction` module imposes a default **5 second I/O + timeout** on every read/write operation via `IO_TIMEOUT`. It wraps + `AsyncReadExt::read_exact` and `AsyncWriteExt::write_all` calls in + `tokio::time::timeout(...)`. For example, reading a frame header uses: ```rust timeout(timeout_dur, r.read_exact(&mut hdr_buf)).await .map_err(|_| TransactionError::Timeout)??; - + ``` - which yields a `TransactionError::Timeout` on elapsed time, or propagates any underlying I/O error (like EOF) as `TransactionError::Io`. Similarly, frame writes are done with `timeout(..., write_all(...))`. These timeouts will be a focus when simulating slow or stalled connections. + which yields a `TransactionError::Timeout` on elapsed time, or propagates any + underlying I/O error (like EOF) as `TransactionError::Io`. Similarly, frame + writes are done with `timeout(..., write_all(...))`. These timeouts will be a + focus when simulating slow or stalled connections. With this understanding, we can see the points where network issues manifest: - **Handshake stage:** potential timeout or malformed data. -- **Reading transactions:** timeouts (no data), unexpected EOF (client drop), or other errors (connection reset). +- **Reading transactions:** timeouts (no data), unexpected EOF (client drop), or + other errors (connection reset). -- **Writing responses:** errors like broken pipe if the client disconnects mid-write, or partial writes. +- **Writing responses:** errors like broken pipe if the client disconnects + mid-write, or partial writes. -Our goal is to *simulate these conditions in tests*. Next, we’ll refactor the code to allow injecting a fake transport, and then write tests for each failure scenario. +Our goal is to *simulate these conditions in tests*. Next, we’ll refactor the +code to allow injecting a fake transport, and then write tests for each failure +scenario. ## Introducing a Testable Transport Abstraction -Currently, `handle_client` is tied to a real `TcpStream`. To test network failures, we need to run `handle_client` (or its subroutines) with a *simulated stream*. We’ll achieve this by abstracting the transport layer behind a trait or generics, so that in tests we can substitute a mock stream object. +Currently, `handle_client` is tied to a real `TcpStream`. To test network +failures, we need to run `handle_client` (or its subroutines) with a *simulated +stream*. We’ll achieve this by abstracting the transport layer behind a trait or +generics, so that in tests we can substitute a mock stream object. -**Refactoring** `handle_client`**:** A straightforward approach is to make `handle_client` generic over the stream’s reader and writer. The Tokio docs suggest writing connection handlers as functions parameterized by `AsyncRead`/`AsyncWrite` implementors, rather than hard-coding `TcpStream`. We can apply this by splitting the logic: +**Refactoring** `handle_client`**:** A straightforward approach is to make +`handle_client` generic over the stream’s reader and writer. The Tokio docs +suggest writing connection handlers as functions parameterized by +`AsyncRead`/`AsyncWrite` implementors, rather than hard-coding `TcpStream`. We +can apply this by splitting the logic: -1. **Split at the call site:** In `accept_connections`, instead of calling `handle_client(socket, ...)` directly, we first split the socket and then call a new generic handler. For example: +1. **Split at the call site:** In `accept_connections`, instead of calling + `handle_client(socket, ...)` directly, we first split the socket and then + call a new generic handler. For example: ```rust let (reader, writer) = tokio::io::split(socket); client_handler(reader, writer, peer, pool, shutdown_rx).await; - + ``` where `client_handler` is our new generic function. -2. **Define** `client_handler` **with trait bounds:** It will take any `Reader` and `Writer` that implement the async read/write traits: +2. **Define** `client_handler` **with trait bounds:** It will take any `Reader` + and `Writer` that implement the async read/write traits: ```rust async fn client_handler( @@ -86,12 +138,19 @@ Currently, `handle_client` is tied to a real `TcpStream`. To test network failur { // ... perform handshake, then transaction loop ... } - + ``` - Inside, we use `TransactionReader::new(reader)` and `TransactionWriter::new(writer)` just as before – since those types are generic over any `AsyncRead`/`AsyncWrite`, this works seamlessly. The handshake logic will use the provided `reader` and `writer` as well. + Inside, we use `TransactionReader::new(reader)` and + `TransactionWriter::new(writer)` just as before – since those types are + generic over any `AsyncRead`/`AsyncWrite`, this works seamlessly. The + handshake logic will use the provided `reader` and `writer` as well. -With this change, `client_handler` no longer assumes a real network `TcpStream`; we can pass in any in-memory or mock stream for testing. **Importantly**, the production code doesn’t lose functionality – we still create actual TCP listeners/streams, but we hand off to the generic handler. This refactor maintains the same behavior while enabling injection of test streams. +With this change, `client_handler` no longer assumes a real network `TcpStream`; +we can pass in any in-memory or mock stream for testing. **Importantly**, the +production code doesn’t lose functionality – we still create actual TCP +listeners/streams, but we hand off to the generic handler. This refactor +maintains the same behavior while enabling injection of test streams. *Example – generic handler signature:* @@ -159,17 +218,33 @@ where } ``` -In the above pseudocode, we essentially mirrored the logic from `handle_client`, but on generic `reader`/`writer`. This refactoring sets the stage for injecting **simulated failures** in tests by providing custom `reader`/`writer` types. +In the above pseudocode, we essentially mirrored the logic from `handle_client`, +but on generic `reader`/`writer`. This refactoring sets the stage for injecting +**simulated failures** in tests by providing custom `reader`/`writer` types. ## Simulating Network Failures with `tokio-test::io::Builder` -With the transport abstracted, we can create **dummy streams** to simulate various network outage scenarios. Tokio’s testing utilities include `tokio_test::io::Builder`, which allows building an object that implements `AsyncRead` and `AsyncWrite` with predetermined behavior. We can script a sequence of reads/writes and even inject errors. +With the transport abstracted, we can create **dummy streams** to simulate +various network outage scenarios. Tokio’s testing utilities include +`tokio_test::io::Builder`, which allows building an object that implements +`AsyncRead` and `AsyncWrite` with predetermined behavior. We can script a +sequence of reads/writes and even inject errors. -For example, the Tokio documentation demonstrates using `Builder` to simulate a simple echo conversation by preloading expected inputs and outputs. We will use a similar approach for failure scenarios. +For example, the Tokio documentation demonstrates using `Builder` to simulate a +simple echo conversation by preloading expected inputs and outputs. We will use +a similar approach for failure scenarios. -**1. Simulating a Handshake Timeout:** In this scenario, the client connects but **never sends the handshake bytes**, causing the server’s 5-second timeout to elapse. To test this without an actual 5-second delay, we can take advantage of Tokio’s ability to **pause time** in tests. By annotating our test with `#[tokio::test(start_paused = true)]`, the Tokio runtime’s clock is frozen at start. We can then `.advance` the clock programmatically to trigger the timeout. +**1. Simulating a Handshake Timeout:** In this scenario, the client connects but +**never sends the handshake bytes**, causing the server’s 5-second timeout to +elapse. To test this without an actual 5-second delay, we can take advantage of +Tokio’s ability to **pause time** in tests. By annotating our test with +`#[tokio::test(start_paused = true)]`, the Tokio runtime’s clock is frozen at +start. We can then `.advance` the clock programmatically to trigger the timeout. -Using `Builder`, we create a `reader` that yields **no data at all** (so the server will be stuck waiting), and after advancing time past 5 seconds, the handshake read future will time out. We also set up a `writer` to capture the handshake timeout error reply the server should send. +Using `Builder`, we create a `reader` that yields **no data at all** (so the +server will be stuck waiting), and after advancing time past 5 seconds, the +handshake read future will time out. We also set up a `writer` to capture the +handshake timeout error reply the server should send. ```rust use tokio_test::io::Builder; @@ -197,9 +272,21 @@ async fn handshake_times_out() { } ``` -In the above test, `Builder::new().build()` for the reader yields an I/O object that returns EOF immediately on reads (since no `.read` is queued). The server’s `read_exact` will wait, but after we advance the virtual clock 5+ seconds, the `timeout` will return `Err`, causing the server to write a timeout error reply. We expect the reply to be 8 bytes (`"TRTP"` + error code 3), which we queued as an expected write. The `test_writer` is configured with `.write(&expected_reply)` to assert that those exact bytes are written. If the server fails to write this or writes different bytes, the test will fail. Finally, we assert that `client_handler` returned `Ok(())` – it should return normally after handling the timeout (not as an error). - -**2. Simulating an Invalid Handshake:** Here, the client does send data, but it’s an incorrect handshake (e.g., wrong protocol ID). We expect the server to detect this and send an error reply with code `HANDSHAKE_ERR_INVALID`, then end the connection. Using `Builder`: +In the above test, `Builder::new().build()` for the reader yields an I/O object +that returns EOF immediately on reads (since no `.read` is queued). The server’s +`read_exact` will wait, but after we advance the virtual clock 5+ seconds, the +`timeout` will return `Err`, causing the server to write a timeout error reply. +We expect the reply to be 8 bytes (`"TRTP"` + error code 3), which we queued as +an expected write. The `test_writer` is configured with +`.write(&expected_reply)` to assert that those exact bytes are written. If the +server fails to write this or writes different bytes, the test will fail. +Finally, we assert that `client_handler` returned `Ok(())` – it should return +normally after handling the timeout (not as an error). + +**2. Simulating an Invalid Handshake:** Here, the client does send data, but +it’s an incorrect handshake (e.g., wrong protocol ID). We expect the server to +detect this and send an error reply with code `HANDSHAKE_ERR_INVALID`, then end +the connection. Using `Builder`: ```rust #[rstest] @@ -224,9 +311,21 @@ async fn handshake_invalid_protocol() { } ``` -In this test, we queue the handshake bytes `"WRNG..."` as the reader input. The server’s `parse_handshake` will return `HandshakeError::InvalidProtocol`. According to `handle_client`, this triggers sending an error reply with code=1 and returning `Ok(())`. Our `test_writer` expects exactly those 8 bytes. We also appended `.read_eof()` after the handshake bytes to indicate the client closed the connection (this ensures the server’s next read sees EOF instead of hanging). The test verifies that `client_handler` completes without propagating an error (it handled the invalid handshake gracefully). - -**3. Simulating Client Disconnect During Handshake:** If a client drops the connection midway through the handshake (e.g., sends nothing or partial handshake then disconnects), the server’s `reader.read_exact` will return an `UnexpectedEof` error immediately. The code treats an EOF during handshake as a normal early disconnect and returns `Ok` (no reply sent). We can simulate this by having the test reader immediately return EOF (without sending any bytes): +In this test, we queue the handshake bytes `"WRNG..."` as the reader input. The +server’s `parse_handshake` will return `HandshakeError::InvalidProtocol`. +According to `handle_client`, this triggers sending an error reply with code=1 +and returning `Ok(())`. Our `test_writer` expects exactly those 8 bytes. We also +appended `.read_eof()` after the handshake bytes to indicate the client closed +the connection (this ensures the server’s next read sees EOF instead of +hanging). The test verifies that `client_handler` completes without propagating +an error (it handled the invalid handshake gracefully). + +**3. Simulating Client Disconnect During Handshake:** If a client drops the +connection midway through the handshake (e.g., sends nothing or partial +handshake then disconnects), the server’s `reader.read_exact` will return an +`UnexpectedEof` error immediately. The code treats an EOF during handshake as a +normal early disconnect and returns `Ok` (no reply sent). We can simulate this +by having the test reader immediately return EOF (without sending any bytes): ```rust #[tokio::test] @@ -241,19 +340,38 @@ async fn handshake_client_disconnect() { } ``` -Here, `test_writer` expects no writes (we didn’t call `.write()` on it). If the server mistakenly attempted to send something, the test would catch an unexpected write. We assert the handler returns `Ok`, meaning it handled the disconnect silently. +Here, `test_writer` expects no writes (we didn’t call `.write()` on it). If the +server mistakenly attempted to send something, the test would catch an +unexpected write. We assert the handler returns `Ok`, meaning it handled the +disconnect silently. -**4. Simulating a Read Timeout During Transactions:** After a successful handshake, if the client stops sending data in the middle of a transaction, the server’s `TransactionReader` will eventually hit the 5-second `IO_TIMEOUT` on a `read_exact`. This produces `TransactionError::Timeout` which propagates out of `read_transaction`. In the `handle_client` loop, any error that isn’t an EOF leads to an `Err` return. We want to test that a stalled connection causes a timeout error and that our code handles it as expected (likely logging and closing the connection). +**4. Simulating a Read Timeout During Transactions:** After a successful +handshake, if the client stops sending data in the middle of a transaction, the +server’s `TransactionReader` will eventually hit the 5-second `IO_TIMEOUT` on a +`read_exact`. This produces `TransactionError::Timeout` which propagates out of +`read_transaction`. In the `handle_client` loop, any error that isn’t an EOF +leads to an `Err` return. We want to test that a stalled connection causes a +timeout error and that our code handles it as expected (likely logging and +closing the connection). Simulating this involves a two-part interaction: - **Handshake phase:** Send a valid handshake to proceed. -- **Transaction phase:** Send a partial transaction (e.g., send only a frame header indicating more data to come, then stall). +- **Transaction phase:** Send a partial transaction (e.g., send only a frame + header indicating more data to come, then stall). -Using `Builder`, we can script the reader to first provide a correct handshake, then provide one frame header and no payload. For instance, suppose we craft a frame header with `total_size = 100` and `data_size = 50` for the first fragment, but we never send the remaining fragment bytes. The server will read the header and 50 bytes of payload, then expect another frame (because `remaining = total_size - data_size` is not zero). If we don’t send the next fragment, `read_frame` will timeout on the next `read_exact` for the header of fragment 2. +Using `Builder`, we can script the reader to first provide a correct handshake, +then provide one frame header and no payload. For instance, suppose we craft a +frame header with `total_size = 100` and `data_size = 50` for the first +fragment, but we never send the remaining fragment bytes. The server will read +the header and 50 bytes of payload, then expect another frame (because +`remaining = total_size - data_size` is not zero). If we don’t send the next +fragment, `read_frame` will timeout on the next `read_exact` for the header of +fragment 2. -Rather than actually waiting 5 seconds, we can again use `start_paused` and advance time. For brevity, we may pseudo-code this test: +Rather than actually waiting 5 seconds, we can again use `start_paused` and +advance time. For brevity, we may pseudo-code this test: ```rust #[tokio::test(start_paused = true)] @@ -297,9 +415,17 @@ async fn transaction_read_timeout() { } ``` -In this test, after sending one fragment, the server will be awaiting the next fragment. By advancing the clock past `IO_TIMEOUT` (5s) with no more data, the next `read_frame` call should time out, causing `read_transaction` to return `TransactionError::Timeout`. Our handler then returns an error (which we assert). We also ensure the handshake was completed successfully by expecting the handshake OK reply on the writer. +In this test, after sending one fragment, the server will be awaiting the next +fragment. By advancing the clock past `IO_TIMEOUT` (5s) with no more data, the +next `read_frame` call should time out, causing `read_transaction` to return +`TransactionError::Timeout`. Our handler then returns an error (which we +assert). We also ensure the handshake was completed successfully by expecting +the handshake OK reply on the writer. -**5. Simulating Connection Reset During Read:** A connection reset (e.g., TCP RST) would typically surface as an I/O error other than EOF. We can simulate this by configuring the test reader to return an error on read. For example, using `Builder`’s ability to inject errors: +**5. Simulating Connection Reset During Read:** A connection reset (e.g., TCP +RST) would typically surface as an I/O error other than EOF. We can simulate +this by configuring the test reader to return an error on read. For example, +using `Builder`’s ability to inject errors: ```rust use std::io::ErrorKind; @@ -309,11 +435,23 @@ let test_reader = Builder::new() .build(); ``` -Here, after the handshake, any attempt by the server to read further will immediately get a `ConnectionReset` error. In `handle_client`, this is caught by the generic `Err(e)` arm (not EOF), and the function will return an error. We can assert that result is an `Err` and matches the expected kind. +Here, after the handshake, any attempt by the server to read further will +immediately get a `ConnectionReset` error. In `handle_client`, this is caught by +the generic `Err(e)` arm (not EOF), and the function will return an error. We +can assert that result is an `Err` and matches the expected kind. -**6. Simulating Partial Write Failures:** So far, we’ve focused on read-side issues. But what if the server fails while **writing** to the client (for instance, the client disconnects just as the server sends a response)? In such a case, `TransactionWriter::write_transaction` might return an error (likely a broken pipe). Our handler would propagate that error out. +**6. Simulating Partial Write Failures:** So far, we’ve focused on read-side +issues. But what if the server fails while **writing** to the client (for +instance, the client disconnects just as the server sends a response)? In such a +case, `TransactionWriter::write_transaction` might return an error (likely a +broken pipe). Our handler would propagate that error out. -To test this, we need a writer that simulates an error on write. The `Builder` can expect writes and also inject errors. One strategy is to have the writer expect part of a write and then error out on the next write. However, since `TransactionWriter` uses `write_all` internally, it will loop until all bytes are written or an error occurs. We can force an error on the *first* write call to simulate an immediate failure. For example: +To test this, we need a writer that simulates an error on write. The `Builder` +can expect writes and also inject errors. One strategy is to have the writer +expect part of a write and then error out on the next write. However, since +`TransactionWriter` uses `write_all` internally, it will loop until all bytes +are written or an error occurs. We can force an error on the *first* write call +to simulate an immediate failure. For example: ```rust let test_writer = Builder::new() @@ -321,9 +459,13 @@ let test_writer = Builder::new() .build(); ``` -If we run a scenario where the server will definitely attempt a write (e.g., after a successful handshake and processing a transaction), this writer will cause that write to fail. +If we run a scenario where the server will definitely attempt a write (e.g., +after a successful handshake and processing a transaction), this writer will +cause that write to fail. -*Example test for write failure:* Suppose we send a valid handshake and then a valid request transaction. We set the writer to error on the response write. We expect `client_handler` to return an error. For conciseness, here’s a sketch: +*Example test for write failure:* Suppose we send a valid handshake and then a +valid request transaction. We set the writer to error on the response write. We +expect `client_handler` to return an error. For conciseness, here’s a sketch: ```rust #[tokio::test] @@ -342,15 +484,28 @@ async fn response_write_failure() { } ``` -In this test, after reading the request, the server will call `tx_writer.write_transaction(&resp)`. The first `.write_all` on the handshake reply succeeded, but the next `.write_all` for the response triggers our injected BrokenPipe error. This error propagates and we assert that `client_handler` returns `Err`. (We don’t necessarily need to assert on the exact error kind in the result, but we could.) +In this test, after reading the request, the server will call +`tx_writer.write_transaction(&resp)`. The first `.write_all` on the handshake +reply succeeded, but the next `.write_all` for the response triggers our +injected BrokenPipe error. This error propagates and we assert that +`client_handler` returns `Err`. (We don’t necessarily need to assert on the +exact error kind in the result, but we could.) -By combining these techniques – **custom readers/writers via** `tokio-test::io::Builder` and **Tokio’s paused time** – we create a deterministic test suite for network failures. +By combining these techniques – **custom readers/writers via** +`tokio-test::io::Builder` and **Tokio’s paused time** – we create a +deterministic test suite for network failures. ## Parameterizing Tests with `rstest` -As the above examples show, many scenarios follow a similar pattern of setup and assertion. We can use the `rstest` crate to avoid repetitive code by parameterizing the scenarios. The `#[rstest]` attribute allows us to define multiple cases for a single test function. +As the above examples show, many scenarios follow a similar pattern of setup and +assertion. We can use the `rstest` crate to avoid repetitive code by +parameterizing the scenarios. The `#[rstest]` attribute allows us to define +multiple cases for a single test function. -For instance, we might create a single test function `test_network_outage_scenarios` with parameters indicating the scenario type. Each case would configure the test reader/writer and expected outcome accordingly: +For instance, we might create a single test function +`test_network_outage_scenarios` with parameters indicating the scenario type. +Each case would configure the test reader/writer and expected outcome +accordingly: ```rust use rstest::rstest; @@ -402,40 +557,116 @@ async fn test_network_outage_scenarios(scenario: Scenario) { } ``` -Above, each `case(...)` provides a different `Scenario` variant. The test builds the appropriate `test_reader`/`test_writer` and then invokes `client_handler`. We use a `should_error` flag to assert the expected outcome. This single parametrized test replaces multiple individual tests, reducing duplication. All scenarios still run in isolation with distinct setups, thanks to `rstest`. +Above, each `case(...)` provides a different `Scenario` variant. The test builds +the appropriate `test_reader`/`test_writer` and then invokes `client_handler`. +We use a `should_error` flag to assert the expected outcome. This single +parametrized test replaces multiple individual tests, reducing duplication. All +scenarios still run in isolation with distinct setups, thanks to `rstest`. ## Using `mockall` for Additional Flexibility -While `tokio-test::io::Builder` covers most needs, there are situations where explicit mocking might be useful. The `mockall` crate can generate mocks for our abstractions. For example, if we had defined a trait `trait Transport: AsyncRead + AsyncWrite + Unpin {}` (or a trait with specific async methods for read/write), we could use `mockall` to create a `MockTransport` and program its behavior (return errors on certain calls, etc.). - -However, mocking `AsyncRead/Write` directly can be complex. An easier target for mocking might be higher-level components: - -- **Accept Loop Simulation:** We could define a trait for the listener (e.g., `trait Listener { async fn accept(&self) -> io::Result<(Box, SocketAddr)> }`). Using `mockall`, we could simulate a listener that returns a predefined sequence of connections or errors. This way, one could test how `accept_connections` reacts to, say, a series of successful accepts followed by an error or immediate shutdown. For instance, a mock listener could be set to return one `MockTransport` (representing a client) and then an `Err` to simulate a network interface error. The test would then verify that `accept_connections` logs the error and continues or exits properly. +While `tokio-test::io::Builder` covers most needs, there are situations where +explicit mocking might be useful. The `mockall` crate can generate mocks for our +abstractions. For example, if we had defined a trait +`trait Transport: AsyncRead + AsyncWrite + Unpin {}` (or a trait with specific +async methods for read/write), we could use `mockall` to create a +`MockTransport` and program its behavior (return errors on certain calls, etc.). -- **Isolating Business Logic:** In our `client_handler` tests above, we mostly ignored the actual `handle_request` logic by using dummy minimal transactions. If we wanted to focus purely on the network layer and not depend on real database calls or command processing, we could abstract the request handling. For example, introduce an interface `trait RequestHandler { async fn handle(&mut self, req: Transaction) -> Result }`, implement it with the real logic for production, and use a mock in tests that just returns a canned response. This way, in a test for a write failure, we don’t invoke the real DB or commands at all – the mock could simply return a simple “OK” response transaction when called. Then we only simulate the network failing on sending that response. Such a mock ensures our test is laser-focused on networking behavior. +However, mocking `AsyncRead/Write` directly can be complex. An easier target for +mocking might be higher-level components: -In summary, **use** `mockall` **when you need to stub out parts of the system that are not the primary target of the test**. For testing network outages in `mxd`, we found we could largely rely on `tokio-test` to simulate the transport. But if, for example, database access or external services were intertwined with the network handling, mocking them out would be essential to create repeatable unit tests. - -## Async Testing Best Practices and Final Thoughts +- **Accept Loop Simulation:** We could define a trait for the listener: -Our refactoring and tests align with async Rust best practices in several ways: - -- **Deterministic Timing:** By using `#[tokio::test(start_paused)]` and controlling the clock, we avoid making tests that actually sleep for seconds. This speeds up the test suite and avoids flakiness due to timing. Always ensure that any time-based logic (like `timeout` calls) in your code can be fast-forwarded in tests. - -- **Trait-Based Abstraction:** Introducing a trait or generic interface for the transport layer follows the dependency-inversion principle. It not only makes testing easier but also means the server code could be extended to other transport types (imagine swapping `TcpStream` with a TLS stream or an in-memory channel) without changing the core logic. This decoupling is a win for maintainability. + ```rust + trait Listener { + async fn accept(&self) -> io::Result<(Box, SocketAddr)>; + } + ``` -- **Using In-Memory Channels:** We saw that `tokio::io::duplex` was already used in `mxd`’s own tests for normal conditions. We built on that idea with `tokio-test::io::Builder` to handle error cases. Both provide lightweight in-process channels that behave like network streams, which is far preferable to spawning real sockets in unit tests. + Using `mockall`, we could simulate a listener that returns a predefined + sequence of connections or errors. This way, one could test how + `accept_connections` reacts to, say, a series of successful accepts followed + by an error or immediate shutdown. For instance, a mock listener could be set + to return one `MockTransport` (representing a client) and then an `Err` to + simulate a network interface error. The test would then verify that + `accept_connections` logs the error and continues or exits properly. -- **Granular Testing:** Rather than trying to simulate a full multi-connection environment at once, we tested one connection at a time in various failure modes. This isolates issues and makes tests simpler. The use of `JoinSet` in `accept_connections` and multi-task concurrency is tested indirectly by these unit tests, but you might also consider integration tests or an end-to-end test (spinning up the server and connecting with a real socket) for additional confidence. Those, however, are typically slower and less deterministic, so unit tests like we’ve written are invaluable for covering edge cases. +- **Isolating Business Logic:** In our `client_handler` tests above, we mostly + ignored the actual `handle_request` logic by using dummy minimal transactions. + If we wanted to focus purely on the network layer and not depend on real + database calls or command processing, we could abstract the request handling. + For example, introduce an interface: -By following this tutorial, you can confidently extend the `mxd` test suite. We demonstrated how to simulate timeouts, abrupt disconnects, and I/O errors for both reads and writes. With parameterized tests and careful use of mocks, the server’s resilience under adverse network conditions can be validated thoroughly. This not only prevents regressions but also documents the intended behavior (for example, that a timeout should result in a specific error code to the client, or that an EOF is treated as a graceful shutdown). + ```rust + trait RequestHandler { + async fn handle(&mut self, req: Transaction) -> Result; + } + ``` -**In conclusion**, testing for network outages in async Rust requires a mix of clever abstractions and tools: + implement it with the real logic for production, and use a mock in tests that + just returns a canned response. This way, in a test for a write failure, we + don’t invoke the real DB or commands at all – the mock could simply return a + simple “OK” response transaction when called. Then we only simulate the + network failing on sending that response. Such a mock ensures our test is + laser-focused on networking behavior. -- **Refactor for testability:** make the code accept fake implementations of I/O. +In summary, **use** `mockall` **when stubbing out parts of the system that are +not the primary target of the test**. For testing network outages in `mxd`, the +tests largely rely on `tokio-test` to simulate the transport. However, if, for +example, database access or external services were intertwined with the network +handling, mocking them out would be essential to create repeatable unit tests. -- **Use tokio’s test tools:** for simulating I/O and controlling time. +## Async Testing Best Practices and Final Thoughts -- **Employ rstest and mockall:** to keep tests clean, avoid repetition, and isolate concerns. +Our refactoring and tests align with async Rust best practices in several ways: -With these techniques, `mxd`’s server is well-equipped to handle the messy reality of networks, and we have high-confidence tests to prove it. +- **Deterministic Timing:** By using `#[tokio::test(start_paused)]` and + controlling the clock, we avoid making tests that actually sleep for seconds. + This speeds up the test suite and avoids flakiness due to timing. Always + ensure that any time-based logic (like `timeout` calls) in the code can be + fast-forwarded in tests. + +- **Trait-Based Abstraction:** Introducing a trait or generic interface for the + transport layer follows the dependency-inversion principle. It not only makes + testing easier but also means the server code could be extended to other + transport types (imagine swapping `TcpStream` with a TLS stream or an + in-memory channel) without changing the core logic. This decoupling is a win + for maintainability. + +- **Using In-Memory Channels:** We saw that `tokio::io::duplex` was already used + in `mxd`’s own tests for normal conditions. We built on that idea with + `tokio-test::io::Builder` to handle error cases. Both provide lightweight + in-process channels that behave like network streams, which is far preferable + to spawning real sockets in unit tests. + +- **Granular Testing:** Rather than trying to simulate a full multi-connection + environment at once, we tested one connection at a time in various failure + modes. This isolates issues and makes tests simpler. The use of `JoinSet` in + `accept_connections` and multi-task concurrency is tested indirectly by these + unit tests, though integration tests or an end-to-end test (spinning up the + server and connecting with a real socket) may also be considered for + additional confidence. Those, however, are typically slower and less + deterministic, so unit tests like we’ve written are invaluable for covering + edge cases. + +Following this tutorial enables confident extension of the `mxd` test suite. We +demonstrated how to simulate timeouts, abrupt disconnects, and I/O errors for +both reads and writes. With parameterized tests and careful use of mocks, the +server’s resilience under adverse network conditions can be validated +thoroughly. This not only prevents regressions but also documents the intended +behavior (for example, that a timeout should result in a specific error code to +the client, or that an EOF is treated as a graceful shutdown). + +**In conclusion**, testing for network outages in async Rust requires a mix of +clever abstractions and tools: + +- **Refactor for testability:** ensure the code accepts fake implementations of + I/O. + +- **Tokio’s test tools** simulate I/O and control time. + +- **Rstest and mockall** keep tests clean, avoid repetition, and isolate + concerns. + +With these techniques, `mxd`’s server is well-equipped to handle the messy +reality of networks, and we have high-confidence tests to prove it.