Skip to content

fix(relay): cap TopicAnnounce + per-signer slots (SEC-V-06)#453

Merged
intendednull merged 1 commit into
auto-fix/batch-2026-04-28-002530from
auto-fix/issue-235-topic-announce-caps
Apr 28, 2026
Merged

fix(relay): cap TopicAnnounce + per-signer slots (SEC-V-06)#453
intendednull merged 1 commit into
auto-fix/batch-2026-04-28-002530from
auto-fix/issue-235-topic-announce-caps

Conversation

@intendednull
Copy link
Copy Markdown
Owner

what

Three caps on topic_announce_listener (SEC-V-06).

  • MAX_TOPICS_PER_ANNOUNCE = 64 — drop announce before per-topic loop. Big msg = no work.
  • MAX_TOPICS_PER_SIGNER = 100 + LRU evict. One peer no monopolise global slot table.
  • WARN_RATE_LIMIT = 60s — replace once-per-session warned_full. Operator see ongoing wall, no log spam.

why

Old listener iterate topics: Vec<String> unbounded. 256 KB envelope of 2-byte topics → ~128 000 blake3 hashes per msg. CPU amplification. Plus topics never removed once added, so one peer fill 10 000 slot table = starve everyone.

how

  • Pure AnnounceState struct: HashMap<String, usize> refcount + per-signer VecDeque<String> LRU. process_topic returns TopicActions { subscribe, unsubscribe, rejected_global, evicted_for_signer }. Caller drive network I/O. No new dep.
  • Refcount = correct slot freeing. Eviction decrement; refcount==0 → network.unsubscribe. Otherwise eviction only frees per-signer slot, global table still drains.
  • LRU = VecDeque. n ≤ 100 so O(n) probe trivial. Re-announce promote (remove + push_back).
  • Rate-limit warn helper should_emit_warn(last, now, interval). Three sites use it: per-msg-cap, global-cap, per-signer-evict.

tradeoffs

Per-msg cap could live in willow-common::unpack_wire to protect every consumer. Rejected: bincode no inline length cap without custom Visitor, relay is only production consumer today. Wire-side defense-in-depth = follow-up when more consumers added. Cap at relay listener, where load-bearing work happen.

tests (relay-tier, lowest covering behaviour)

  • topic_announce_listener_rejects_oversized_announce — 65-topic announce drop, sentinel still work
  • announce_state_per_signer_lru_evicts_oldest — 101st evict t0, subscribe new
  • announce_state_per_signer_lru_does_not_starve_other_signers — A fill quota, B still subscribe
  • announce_state_repeat_announce_promotes_lru_no_resubscribe — LRU touch, no network call
  • announce_state_shared_topic_refcount_keeps_subscription — refcount keep sub alive
  • announce_state_rejects_at_global_cap — fill 10 000 across signers, outsider rejected
  • should_emit_warn_rate_limits_to_one_per_window — direct test of helper

Removed topic_announce_listener_enforces_max_topics_cap — its single-announce-of-10001 now blocked by per-msg cap. Global-cap behaviour exercised at unit tier instead. State-machine logic only → state-tier per CLAUDE.md decision tree.

verify

Local — sub-PR base ≠ main, CI no fire. Local gate green:

  • cargo fmt --check clean
  • cargo clippy --workspace --all-targets -- -D warnings clean
  • cargo test --workspace all pass (62/26/45/18/5/13/12/31/227/35/16/73/28/13 across crates, no failures)
  • cargo check --target wasm32-unknown-unknown -p willow-identity ... -p willow-web clean

Refs #235


Generated by Claude Code

Three layered caps on the TopicAnnounce listener:

- MAX_TOPICS_PER_ANNOUNCE = 64 — drop announce before any per-topic
  work runs. Prior code iterated unbounded; a 256 KB envelope of 2-byte
  topics meant ~128 000 blake3 hashes per message (CPU amplification).
- MAX_TOPICS_PER_SIGNER = 100 with per-signer LRU eviction. Stops one
  peer from monopolising the global slot table (MAX_TOPICS = 10 000)
  and starving legitimate clients. Hand-rolled VecDeque + HashMap
  refcount: insert appends, eviction pops front, refcount==0 triggers
  network.unsubscribe so the global slot is actually freed.
- WARN_RATE_LIMIT = 60s. Replaces the once-per-session warned_full
  flag so operators see ongoing pressure without log spam. Applied to
  per-message-cap, global-cap, and per-signer-eviction warns.

Refactored listener body around a pure AnnounceState struct that
returns TopicActions describing what to do on the network — keeps the
state machine unit-testable without driving MemNetwork at 10 000-topic
scale.

Tests added (relay-tier, lowest covering behaviour):
- topic_announce_listener_rejects_oversized_announce — integration
  test: 65-topic announce dropped, sentinel announce still works.
- announce_state_per_signer_lru_evicts_oldest — fills 100 topics for
  one signer, the 101st evicts t0 and subscribes to the new one.
- announce_state_per_signer_lru_does_not_starve_other_signers — A
  fills its quota, B can still subscribe.
- announce_state_repeat_announce_promotes_lru_no_resubscribe — LRU
  touch on re-announce, no network call.
- announce_state_shared_topic_refcount_keeps_subscription —
  refcount keeps subscription alive when one signer evicts.
- announce_state_rejects_at_global_cap — fills 10 000 slots across
  multiple signers, fresh signer's new topic rejected.
- should_emit_warn_rate_limits_to_one_per_window — directly exercises
  the rate-limit helper.
- topic_announce_listener_enforces_max_topics_cap removed (its single-
  announce-of-10001 setup is now blocked by the per-message cap; the
  global-cap behaviour is exercised at the unit tier instead).

Tradeoff considered: enforcing the per-message cap in willow-common's
unpack_wire would protect every consumer, but bincode does not expose
inline length caps without a custom Visitor and the relay is the only
production consumer; defense-in-depth at the wire layer is a follow-up
if we add more consumers. Per-message cap lives in the relay listener
where the load-bearing work happens.

Refs #235
@intendednull intendednull merged commit 3ee8a98 into auto-fix/batch-2026-04-28-002530 Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants