Skip to content

Async client: fix silent data loss on the final frame of a server stream#312

Open
shvbsle wants to merge 1 commit intocontainerd:masterfrom
shvbsle:ttrpc-bug
Open

Async client: fix silent data loss on the final frame of a server stream#312
shvbsle wants to merge 1 commit intocontainerd:masterfrom
shvbsle:ttrpc-bug

Conversation

@shvbsle
Copy link
Copy Markdown
Contributor

@shvbsle shvbsle commented Apr 18, 2026

Fixes #311

Process each frame inline in ClientReader::handle_msg instead of tokio::spawn-ing per frame. The read loop in connection.rs already awaits handle_msg per frame, so inline preserves per-stream wire order and restores natural back-pressure on the per-stream mpsc. handle_err adjusted the same way for consistency.

Server dispatch also spawns per frame, but there the spawn is load-bearing (user handlers can take arbitrary time) and a oneshot already preserves ordering. Left alone.

Verification

See reproducer in #311. With that patch applied on unfixed upstream the example panics on stream.recv().await.unwrap().unwrap() within the 1000-iteration loop; with this PR applied the same run is clean.

Compatibility

Public API, wire format, and observable behavior for correctly-behaving consumers are unchanged.

@shvbsle
Copy link
Copy Markdown
Contributor Author

shvbsle commented Apr 20, 2026

On CI Failures: The CI build failures are unrelated to the code changes. Cargo 1.81 cannot resolve the current crate ecosystem because new crate versions now require edition2024. Seems like infrastructure rot.

Comment thread src/asynchronous/client.rs Outdated
@fletcherw
Copy link
Copy Markdown

On CI Failures: The CI build failures are unrelated to the code changes. Cargo 1.81 cannot resolve the current crate ecosystem because new crate versions now require edition2024. Seems like infrastructure rot.

You can update rust-toolchain.toml to 1.95 to fix this.

@shvbsle
Copy link
Copy Markdown
Contributor Author

shvbsle commented Apr 20, 2026

You can update rust-toolchain.toml to 1.95 to fix this.

I had a PR for this but its not as simple as updating the toolchain. Need to fix the new clippy issues that are introduced in new versions:
#313

I can cherry pick or rebase this PR after the above one is merged

@wllenyj
Copy link
Copy Markdown
Collaborator

wllenyj commented Apr 22, 2026

LGTM.

At the time of implementation, it was simply designed to behave exactly like the Go version, even though the Go version might be incorrect...

ClientReader::handle_msg spawns a new task per frame. For a
server-streaming RPC, the final DATA frame and the subsequent
FLAG_REMOTE_CLOSED frame then race: if the close-frame task grabs
the req_map lock first, it removes the stream from the map, and
the preceding data-frame task finds nothing and silently drops
the payload. The stream consumer sees Ok(None) (EOF) without
ever observing the payload the server sent.

The connection read loop already awaits handle_msg per frame, so
processing inline preserves per-stream wire order. It also gives
the per-stream mpsc natural back-pressure in place of the
unbounded per-frame spawning.

handle_err gets the same treatment for consistency.

Also update example/async-stream-client.rs so the existing
echo_default_value example exercises the race: wrap it in a
1000-iteration loop and flip the client runtime to multi_thread.
On a single-threaded runtime the tokio scheduler masks the race
because spawned tasks run in submission order. Without the fix,
the modified example fails on any multi-core runner; with the
fix, it passes.

Signed-off-by: Shiv Bhosale <shvbsle@amazon.com>
Copy link
Copy Markdown

@tzneal tzneal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@dims
Copy link
Copy Markdown
Member

dims commented May 4, 2026

@tzneal @shvbsle can the reproducer be turned into a test case perhaps? worth it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Async client silently drops the final DATA frame of a server-streaming RPC

5 participants