Async client: fix silent data loss on the final frame of a server stream#312
Async client: fix silent data loss on the final frame of a server stream#312shvbsle wants to merge 1 commit intocontainerd:masterfrom
Conversation
|
On CI Failures: The CI build failures are unrelated to the code changes. Cargo 1.81 cannot resolve the current crate ecosystem because new crate versions now require edition2024. Seems like infrastructure rot. |
You can update rust-toolchain.toml to 1.95 to fix this. |
I had a PR for this but its not as simple as updating the toolchain. Need to fix the new clippy issues that are introduced in new versions: I can cherry pick or rebase this PR after the above one is merged |
|
LGTM. At the time of implementation, it was simply designed to behave exactly like the Go version, even though the Go version might be incorrect... |
ClientReader::handle_msg spawns a new task per frame. For a server-streaming RPC, the final DATA frame and the subsequent FLAG_REMOTE_CLOSED frame then race: if the close-frame task grabs the req_map lock first, it removes the stream from the map, and the preceding data-frame task finds nothing and silently drops the payload. The stream consumer sees Ok(None) (EOF) without ever observing the payload the server sent. The connection read loop already awaits handle_msg per frame, so processing inline preserves per-stream wire order. It also gives the per-stream mpsc natural back-pressure in place of the unbounded per-frame spawning. handle_err gets the same treatment for consistency. Also update example/async-stream-client.rs so the existing echo_default_value example exercises the race: wrap it in a 1000-iteration loop and flip the client runtime to multi_thread. On a single-threaded runtime the tokio scheduler masks the race because spawned tasks run in submission order. Without the fix, the modified example fails on any multi-core runner; with the fix, it passes. Signed-off-by: Shiv Bhosale <shvbsle@amazon.com>
Fixes #311
Process each frame inline in
ClientReader::handle_msginstead oftokio::spawn-ing per frame. The read loop inconnection.rsalready awaitshandle_msgper frame, so inline preserves per-stream wire order and restores natural back-pressure on the per-stream mpsc.handle_erradjusted the same way for consistency.Server dispatch also spawns per frame, but there the spawn is load-bearing (user handlers can take arbitrary time) and a oneshot already preserves ordering. Left alone.
Verification
See reproducer in #311. With that patch applied on unfixed upstream the example panics on
stream.recv().await.unwrap().unwrap()within the 1000-iteration loop; with this PR applied the same run is clean.Compatibility
Public API, wire format, and observable behavior for correctly-behaving consumers are unchanged.