Skip to content

Harden connection state machine: cap header block, monitor owner, propagate send errors#2

Merged
benoitc merged 1 commit into
mainfrom
harden-connection-state-machine
May 19, 2026
Merged

Harden connection state machine: cap header block, monitor owner, propagate send errors#2
benoitc merged 1 commit into
mainfrom
harden-connection-state-machine

Conversation

@benoitc
Copy link
Copy Markdown
Owner

@benoitc benoitc commented May 19, 2026

Summary

Three correctness fixes in h2_connection, surfaced by a security/concurrency audit of the codebase:

  • CONTINUATION flood (Critical, RFC 9113 §10.5 / §6.5.2)handle_continuation/4 appended every CONTINUATION payload to a per-stream iolist with no size cap, and max_header_list_size only runs after HPACK decode. A peer streaming CONTINUATIONs could OOM the node before the limit fires. Track raw bytes on the stream and bail with GOAWAY(ENHANCE_YOUR_CALM) past 256 KB.
  • controlling_process/2 owner liveness (Critical) — the swap updated state.owner but left the new owner untracked; the new owner's death produced no signal and the connection orphaned the socket. Install a monitor on the new owner, demonitor the previous one. The initial owner still uses the start_link bidirectional link.
  • send_frame/2 error propagation (High) — always returned ok even after Transport:send returned {error, closed}. send_request / send_response / send_data / send_trailers / send_request_headers replied ok to the caller after the wire died. Now send_frame/2 returns the real result; API reply paths stop with {shutdown, {send_failed, Reason}} and propagate {error, Reason} to the in-flight caller. Internal async sites (window updates, settings) stay best-effort since the imminent tcp_closed / ssl_closed surfaces the failure anyway.

Tests

Three new CT regression tests in h2_compliance_SUITE:

  • continuation_flood_triggers_enhance_your_calm_test — 320 KB of CONTINUATION traffic must yield GOAWAY/ENHANCE_YOUR_CALM.
  • controlling_process_monitors_new_owner_test — killing the new owner after transfer terminates the connection.
  • send_returns_error_on_closed_socket_test — closing the underlying socket then calling send_data yields {error, _}, not ok.

All 92 tests pass.

…pagate send errors

Three correctness fixes in h2_connection:

CONTINUATION flood (Critical, RFC 9113 §10.5 / §6.5.2). Track raw bytes
of the in-flight HEADERS+CONTINUATION block on the stream; bail with
GOAWAY(ENHANCE_YOUR_CALM) past 256 KB so a peer cannot OOM the node
before max_header_list_size (which acts post-HPACK) can fire.

controlling_process/2 owner liveness (Critical). The original swap
only updated state.owner and left the new owner untracked: the new
owner's death produced no signal and the connection orphaned the
socket. Install a monitor on the new owner and demonitor the previous
one; the initial owner still uses the start_link bidirectional link.

send_frame/2 error propagation (High). Previously always returned ok
even after Transport:send returned {error, closed}, so send_request /
send_response / send_data / send_trailers / send_request_headers
replied ok to the caller after the wire died. Now returns the real
result; API reply paths stop with {shutdown, {send_failed, Reason}}
and propagate {error, Reason} to the in-flight caller. Internal
async sites (window updates, settings) stay best-effort since the
imminent tcp_closed/ssl_closed will surface the failure.

Adds CT regression tests for each.
@benoitc benoitc merged commit a1f6758 into main May 19, 2026
5 checks passed
@benoitc benoitc deleted the harden-connection-state-machine branch May 19, 2026 12:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant