Skip to content

relay: bootstrap timeouts, conn cap, and topic set bounds#135

Merged
intendednull merged 1 commit into
mainfrom
claude/issue-112-113-relay-hardening
Apr 11, 2026
Merged

relay: bootstrap timeouts, conn cap, and topic set bounds#135
intendednull merged 1 commit into
mainfrom
claude/issue-112-113-relay-hardening

Conversation

@intendednull
Copy link
Copy Markdown
Owner

Summary

Two relay-side DoS hardening fixes:

  • Closes [relay] Bootstrap HTTP endpoint has no timeout or connection cap (Slowloris) #112crates/relay/src/main.rs bootstrap-id HTTP endpoint had no I/O timeout (Slowloris-friendly), no concurrent-connection cap (FD/memory exhaustion), and no Connection: close header. Wrap each per-connection read and write_all in tokio::time::timeout(BOOTSTRAP_IO_TIMEOUT = 5s), gate accepts on Arc<Semaphore> sized to MAX_CONCURRENT_BOOTSTRAP_CONNECTIONS = 1024 via try_acquire_owned() (held for the lifetime of the per-connection task), and add Connection: close to the response.
  • Closes [relay] Topic subscription set grows without bound #113topic_announce_listener kept an unbounded HashSet<String> of subscribed topics, so any signed peer could announce arbitrary unique strings to grow the set forever and trigger an unbounded number of network.subscribe calls. Cap the set at MAX_TOPICS = 10_000 (drops further unique announces with one warn) and validate every incoming string via the new topic_str_is_valid helper (MAX_TOPIC_LEN = 256, ASCII alphanumerics + _ / : . -). Invalid topics are warn-logged and never inserted/subscribed.

The bootstrap handler and topic-announce listener moved into crates/relay/src/lib.rs (previously a stub) so they can be exercised by integration tests in crates/relay/tests/. main.rs is now a thin shim that wires them up. Nothing else in the workspace depends on willow-relay, so the new lib has no downstream impact.

Progresses #108.

Test plan

New regression coverage in crates/relay/tests/bootstrap_endpoint.rs plus topic_str_is_valid unit tests in crates/relay/src/lib.rs:

  • topic_str_is_valid_accepts_basic_ascii — alphanumerics, /, _, :, ., - all accepted
  • topic_str_is_valid_rejects_empty — empty string rejected
  • topic_str_is_valid_rejects_too_longMAX_TOPIC_LEN + 1 rejected, exact MAX_TOPIC_LEN accepted
  • topic_str_is_valid_rejects_disallowed_chars — whitespace, control bytes, punctuation, non-ASCII, NUL all rejected
  • bootstrap_endpoint_serves_normal_request_quickly — a normal GET returns the id within ms
  • bootstrap_response_contains_connection_close_header — response carries the new header
  • handle_bootstrap_connection_times_out_slow_reader — paused tokio time + tokio::io::duplex pipe; advancing past BOOTSTRAP_IO_TIMEOUT yields ErrorKind::TimedOut without 5s of wall clock
  • bootstrap_listener_drops_connections_when_capacity_saturated — capacity-1 semaphore + a held connection forces the next accept to be denied a permit; the test asserts the second client sees EOF (or ConnectionReset) instead of a response
  • bootstrap_listener_recovers_after_permit_released — once a connection completes its permit is released and the next request succeeds again
  • cargo test -p willow-relay — all 10 tests pass (4 unit, 6 integration), under 0.1s wall clock
  • cargo clippy --workspace --all-targets -- -D warnings — clean
  • cargo fmt --check — clean

cargo test --workspace could not be run to completion in this worktree due to disk pressure from sibling worktrees, but cargo clippy --workspace --all-targets (which type-checks every test target) is clean and willow-relay is a leaf binary crate with no reverse dependencies.

https://claude.ai/code/session_018TTGhL645aTR4RZWrRBLPS

Closes #112. The bootstrap-id HTTP endpoint accept loop had no I/O
timeout, no concurrent-connection cap, and no Connection: close
header, so a Slowloris-style flood could exhaust file descriptors
indefinitely. Wrap each per-connection read and write_all in
tokio::time::timeout(BOOTSTRAP_IO_TIMEOUT = 5s), gate accepts on an
Arc<Semaphore> sized to MAX_CONCURRENT_BOOTSTRAP_CONNECTIONS = 1024
via try_acquire_owned() (held for the lifetime of the per-connection
task and dropped explicitly when the handler returns), and emit a
Connection: close header in the response.

Closes #113. topic_announce_listener kept an unbounded HashSet of
subscribed topic strings; any signed peer could announce arbitrary
unique strings to grow the set forever and trigger an unbounded
number of network.subscribe calls. Cap the set at MAX_TOPICS = 10_000
(further unique announces are dropped, with one warn log per process
once the cap is hit) and validate every incoming topic via the new
topic_str_is_valid helper: non-empty, MAX_TOPIC_LEN = 256 bytes max,
characters restricted to ASCII alphanumerics or '_ / : . -'. Invalid
topics are warn-logged and never inserted into the set or subscribed
to.

The bootstrap handler and topic-announce listener moved into the
crate's lib.rs (previously a stub) so they can be exercised by
integration tests in crates/relay/tests/, with main.rs reduced to a
thin shim that wires them up. Nothing depends on willow-relay so the
new lib has no downstream impact.

Regression coverage in crates/relay/tests/bootstrap_endpoint.rs:
- bootstrap_endpoint_serves_normal_request_quickly: a normal GET
  receives a 200 response containing the bootstrap id within ~ms.
- bootstrap_response_contains_connection_close_header: the response
  carries the new Connection: close header.
- handle_bootstrap_connection_times_out_slow_reader: paused tokio
  time + duplex pipe; advancing past BOOTSTRAP_IO_TIMEOUT yields
  ErrorKind::TimedOut without waiting 5s of wall clock.
- bootstrap_listener_drops_connections_when_capacity_saturated: a
  capacity-1 semaphore + a hold connection forces the next accept
  to be denied a permit; the test asserts the second client sees
  EOF (or ConnectionReset) instead of a response.
- bootstrap_listener_recovers_after_permit_released: a completed
  request releases its permit so the next request succeeds again.
- topic_str_is_valid unit tests cover valid ASCII, empty string,
  MAX_TOPIC_LEN boundary, and disallowed characters (whitespace,
  control bytes, punctuation, non-ASCII, NUL).

Progresses #108.

https://claude.ai/code/session_018TTGhL645aTR4RZWrRBLPS
@intendednull intendednull merged commit 726b37e into main Apr 11, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[relay] Topic subscription set grows without bound [relay] Bootstrap HTTP endpoint has no timeout or connection cap (Slowloris)

2 participants