relay: bootstrap timeouts, conn cap, and topic set bounds#135
Merged
Conversation
Closes #112. The bootstrap-id HTTP endpoint accept loop had no I/O timeout, no concurrent-connection cap, and no Connection: close header, so a Slowloris-style flood could exhaust file descriptors indefinitely. Wrap each per-connection read and write_all in tokio::time::timeout(BOOTSTRAP_IO_TIMEOUT = 5s), gate accepts on an Arc<Semaphore> sized to MAX_CONCURRENT_BOOTSTRAP_CONNECTIONS = 1024 via try_acquire_owned() (held for the lifetime of the per-connection task and dropped explicitly when the handler returns), and emit a Connection: close header in the response. Closes #113. topic_announce_listener kept an unbounded HashSet of subscribed topic strings; any signed peer could announce arbitrary unique strings to grow the set forever and trigger an unbounded number of network.subscribe calls. Cap the set at MAX_TOPICS = 10_000 (further unique announces are dropped, with one warn log per process once the cap is hit) and validate every incoming topic via the new topic_str_is_valid helper: non-empty, MAX_TOPIC_LEN = 256 bytes max, characters restricted to ASCII alphanumerics or '_ / : . -'. Invalid topics are warn-logged and never inserted into the set or subscribed to. The bootstrap handler and topic-announce listener moved into the crate's lib.rs (previously a stub) so they can be exercised by integration tests in crates/relay/tests/, with main.rs reduced to a thin shim that wires them up. Nothing depends on willow-relay so the new lib has no downstream impact. Regression coverage in crates/relay/tests/bootstrap_endpoint.rs: - bootstrap_endpoint_serves_normal_request_quickly: a normal GET receives a 200 response containing the bootstrap id within ~ms. - bootstrap_response_contains_connection_close_header: the response carries the new Connection: close header. - handle_bootstrap_connection_times_out_slow_reader: paused tokio time + duplex pipe; advancing past BOOTSTRAP_IO_TIMEOUT yields ErrorKind::TimedOut without waiting 5s of wall clock. - bootstrap_listener_drops_connections_when_capacity_saturated: a capacity-1 semaphore + a hold connection forces the next accept to be denied a permit; the test asserts the second client sees EOF (or ConnectionReset) instead of a response. - bootstrap_listener_recovers_after_permit_released: a completed request releases its permit so the next request succeeds again. - topic_str_is_valid unit tests cover valid ASCII, empty string, MAX_TOPIC_LEN boundary, and disallowed characters (whitespace, control bytes, punctuation, non-ASCII, NUL). Progresses #108. https://claude.ai/code/session_018TTGhL645aTR4RZWrRBLPS
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two relay-side DoS hardening fixes:
crates/relay/src/main.rsbootstrap-id HTTP endpoint had no I/O timeout (Slowloris-friendly), no concurrent-connection cap (FD/memory exhaustion), and noConnection: closeheader. Wrap each per-connectionreadandwrite_allintokio::time::timeout(BOOTSTRAP_IO_TIMEOUT = 5s), gate accepts onArc<Semaphore>sized toMAX_CONCURRENT_BOOTSTRAP_CONNECTIONS = 1024viatry_acquire_owned()(held for the lifetime of the per-connection task), and addConnection: closeto the response.topic_announce_listenerkept an unboundedHashSet<String>of subscribed topics, so any signed peer could announce arbitrary unique strings to grow the set forever and trigger an unbounded number ofnetwork.subscribecalls. Cap the set atMAX_TOPICS = 10_000(drops further unique announces with one warn) and validate every incoming string via the newtopic_str_is_validhelper (MAX_TOPIC_LEN = 256, ASCII alphanumerics +_ / : . -). Invalid topics are warn-logged and never inserted/subscribed.The bootstrap handler and topic-announce listener moved into
crates/relay/src/lib.rs(previously a stub) so they can be exercised by integration tests incrates/relay/tests/.main.rsis now a thin shim that wires them up. Nothing else in the workspace depends onwillow-relay, so the new lib has no downstream impact.Progresses #108.
Test plan
New regression coverage in
crates/relay/tests/bootstrap_endpoint.rsplustopic_str_is_validunit tests incrates/relay/src/lib.rs:topic_str_is_valid_accepts_basic_ascii— alphanumerics,/,_,:,.,-all acceptedtopic_str_is_valid_rejects_empty— empty string rejectedtopic_str_is_valid_rejects_too_long—MAX_TOPIC_LEN + 1rejected, exactMAX_TOPIC_LENacceptedtopic_str_is_valid_rejects_disallowed_chars— whitespace, control bytes, punctuation, non-ASCII, NUL all rejectedbootstrap_endpoint_serves_normal_request_quickly— a normalGETreturns the id within msbootstrap_response_contains_connection_close_header— response carries the new headerhandle_bootstrap_connection_times_out_slow_reader— paused tokio time +tokio::io::duplexpipe; advancing pastBOOTSTRAP_IO_TIMEOUTyieldsErrorKind::TimedOutwithout 5s of wall clockbootstrap_listener_drops_connections_when_capacity_saturated— capacity-1 semaphore + a held connection forces the next accept to be denied a permit; the test asserts the second client sees EOF (orConnectionReset) instead of a responsebootstrap_listener_recovers_after_permit_released— once a connection completes its permit is released and the next request succeeds againcargo test -p willow-relay— all 10 tests pass (4 unit, 6 integration), under 0.1s wall clockcargo clippy --workspace --all-targets -- -D warnings— cleancargo fmt --check— cleancargo test --workspacecould not be run to completion in this worktree due to disk pressure from sibling worktrees, butcargo clippy --workspace --all-targets(which type-checks every test target) is clean andwillow-relayis a leaf binary crate with no reverse dependencies.https://claude.ai/code/session_018TTGhL645aTR4RZWrRBLPS