intendednull · intendednull · May 3, 2026 · May 3, 2026 · May 3, 2026 · May 3, 2026
diff --git a/.claude/skills/resolving-issues/SKILL.md b/.claude/skills/resolving-issues/SKILL.md
@@ -59,6 +59,10 @@ Why this shape:
    - if closed issue is a sub-issue of a parent, run parent close-out check (see ## Parent issue close-out) — close parent too if this was its last open child
 
    Closing GH issues = metadata work, not code work; allowed under "coordinator never codes." Implementers only dispatch for *unresolved* issues. This pass typically clears audit lessons (already folded into skills), structural-deps follow-ups (still structural), and audit findings closed by intervening fixes — saves dispatch overhead on no-op work.
+
+   **When same-day audit-to-fix gap, expect ~zero already-fixed hits.** The sweep yields most when the audit ran days/weeks before the fix run (intervening PRs land between audit and fix). When the audit was filed earlier the same day against the current `main`, no fixes can have landed between audit and fix run by definition — the sweep correctly returns empty. Don't over-invest sweep effort in this case; one quick `git log <audit-ref>..origin/main` check on each pick's path is enough to confirm.
+
+   **Ambiguous-fix-path (audit premise real, fix is a design call) — coordinator-skip without close.** Some audit findings flag a real concern but the prescribed fix is ambiguous-by-design with ≥ 2 legitimate approaches that need a design decision the coordinator can't make unilaterally. Common shape: "audit-glob doesn't catch in-file `mod tests` — move to external test file OR update glob." Both are valid; picking either silently is wrong. **Action:** during step 6 picks, if the audit's literal fix conflicts with HEAD reality (e.g. code already satisfies the surface premise) AND the underlying concern requires a design call, **skip the issue from this run's dispatch queue without closing it**. Note it in the run-end `## Lessons Learned` so the next session has full context. Don't comment on the issue — the audit description already captures the concern; a "skipped, design call needed" comment adds noise without progress. The next run with fresh eyes / accumulated context can decide.
 7. Per remaining issue, sequential, max 10 per run:
    - **Pre-dispatch sync:** before spawning each implementer, in the coordinator's checkout:
      ```bash

diff --git a/.github/workflows/deploy.yml b/.github/workflows/deploy.yml
@@ -72,6 +72,17 @@ jobs:
 
       - name: Verify deployment
         run: |
-          sleep 3
-          curl -sf -o /dev/null -w "Web: HTTP %{http_code}\n" https://willow.intendednull.com/
+          # Poll the web endpoint until it serves (max ~30s) instead of a
+          # blind 3s sleep — avoids flaking when the relay takes longer to
+          # bind and masking a silent deploy failure.
+          for i in $(seq 1 15); do
+            if curl -sf -o /dev/null -w "Web: HTTP %{http_code}\n" https://willow.intendednull.com/; then
+              break
+            fi
+            if [ "$i" -eq 15 ]; then
+              echo "verify: HTTP probe never succeeded after 15 attempts (~30s)" >&2
+              exit 1
+            fi
+            sleep 2
+          done
           sshpass -p '${{ secrets.DEPLOY_PASSWORD }}' ssh -o StrictHostKeyChecking=no root@willow.intendednull.com 'systemctl is-active willow-relay'
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -342,7 +342,7 @@ All shared state derived from per-author Merkle DAG of signed events. `willow-st
 - **EventDag** (`crates/state/src/dag.rs`): in-memory store of all known events, indexed by `EventHash`. `EventDag::insert` validates signature, genesis, `prev`/`deps` linkage. No explicit "merge" — DAG converges as events arrive.
 - **ServerState** (`crates/state/src/server.rs`): materialized state derived by walking the DAG.
 - **`materialize::apply_incremental(state, event)`**: ONLY public mutation entry point. Pure. Internally calls `apply_event` + `required_permission` for authority checks.
-- **Permission model**: Owner = root of trust. Fine-grained permissions (SyncProvider, ManageChannels, ManageRoles, KickMembers, SendMessages, CreateInvite, Administrator) granted via `GrantPermission` events.
+- **Permission model**: Owner = root of trust. Fine-grained permissions (SyncProvider, ManageChannels, ManageRoles, SendMessages, CreateInvite) granted via `GrantPermission` events. Admin status is structurally separate — managed exclusively through `ProposedAction` + vote path, never via `GrantPermission`. Kicks are an admin-only `ProposedAction` (no granular "can kick" permission).
 - **Sync** (`crates/state/src/sync.rs`): `HeadsSummary` = compact per-author DAG state for efficient sync; `PendingBuffer` holds events arriving before their `prev` chain predecessors.
 
 ### Trust Model

diff --git a/Cargo.lock b/Cargo.lock
diff --git a/crates/client/src/worker_cache.rs b/crates/client/src/worker_cache.rs
@@ -58,7 +58,15 @@ impl WorkerCache {
 
     /// Evict workers that haven't sent a heartbeat within the TTL.
     pub fn evict_stale(&mut self) {
-        let cutoff = Instant::now() - self.ttl;
+        self.evict_stale_at(Instant::now());
+    }
+
+    /// Evict workers stale relative to an injected `now`.
+    ///
+    /// Used by tests to remove timing dependencies; production callers should
+    /// use [`Self::evict_stale`].
+    pub(crate) fn evict_stale_at(&mut self, now: Instant) {
+        let cutoff = now - self.ttl;
         self.workers.retain(|_, info| info.last_seen > cutoff);
     }
 
@@ -171,10 +179,11 @@ mod tests {
 
     #[test]
     fn evict_stale_removes_expired() {
-        let mut cache = WorkerCache::new(Duration::from_millis(1));
+        let mut cache = WorkerCache::new(Duration::from_secs(30));
         cache.update(&make_replay_announcement(gen_id(), vec!["srv-1"]));
-        std::thread::sleep(Duration::from_millis(10));
-        cache.evict_stale();
+        // Simulate "now" being well past the TTL boundary instead of sleeping.
+        let future = Instant::now() + Duration::from_secs(60);
+        cache.evict_stale_at(future);
         assert!(cache.is_empty());
     }
 

diff --git a/crates/common/src/lib.rs b/crates/common/src/lib.rs
@@ -24,3 +24,21 @@ pub use worker_types::*;
 /// `willow-common` (already a dep of both crates) guarantees they stay
 /// aligned at compile time.
 pub const SYNC_BATCH_LIMIT: usize = 10_000;
+
+/// Maximum number of `(author, head)` entries accepted from a peer-supplied
+/// [`willow_state::HeadsSummary`] in a single sync / history call.
+///
+/// Single source of truth shared by:
+/// - `willow-storage` — `sync_since` / `history` build SQL by concatenating
+///   one `(author = ? AND seq <op> ?)` fragment per entry; an oversize
+///   summary would either exceed rusqlite's bind-parameter limit
+///   (default 32766) or waste CPU compiling a giant prepared statement.
+/// - `willow-replay` — `handle_request(Sync)` iterates per-author into a
+///   `BTreeMap` then walks the in-memory DAG; an oversize summary forces
+///   O(N) work and is the same DoS shape as the storage path.
+///
+/// 256 is well above any plausible honest server's distinct-author count
+/// while keeping bind-parameter and BTreeMap-construction costs bounded.
+/// Both production sites MUST agree on this value, so the canonical
+/// definition lives here alongside [`SYNC_BATCH_LIMIT`].
+pub const MAX_AUTHORS_PER_SYNC: usize = 256;
diff --git a/crates/identity/Cargo.toml b/crates/identity/Cargo.toml
@@ -20,5 +20,4 @@ getrandom_02 = { package = "getrandom", version = "0.2", features = ["js"] }
 getrandom_03 = { package = "getrandom", version = "0.3", features = ["wasm_js"] }
 
 [dev-dependencies]
-tokio = { workspace = true }
 tempfile = "3"
diff --git a/crates/replay/Cargo.toml b/crates/replay/Cargo.toml
@@ -12,6 +12,7 @@ willow-worker = { path = "../worker" }
 willow-state = { path = "../state" }
 willow-identity = { path = "../identity" }
 willow-network = { path = "../network" }
+willow-common = { path = "../common" }
 clap = { version = "4", features = ["derive"] }
 dirs = "6"
 anyhow = { workspace = true }

diff --git a/crates/replay/src/role.rs b/crates/replay/src/role.rs
@@ -7,6 +7,7 @@
 use std::collections::{BTreeMap, HashMap};
 
 use tracing::warn;
+use willow_common::MAX_AUTHORS_PER_SYNC;
 use willow_state::{
     apply_incremental, Event, EventDag, EventHash, EventKind, HeadsSummary, InsertError,
     PendingBuffer, ServerState, Snapshot, DEFAULT_PENDING_MAX_AGE_MS, DEFAULT_PENDING_MAX_ENTRIES,
@@ -273,6 +274,21 @@ impl WorkerRole for ReplayRole {
     fn handle_request(&mut self, req: WorkerRequest) -> WorkerResponse {
         match req {
             WorkerRequest::Sync { server_id, heads } => {
+                // Reject peer-supplied summaries that would force O(N)
+                // BTreeMap construction and DAG walks (see
+                // `MAX_AUTHORS_PER_SYNC`). Mirrors the storage cap added in
+                // PR #507 / b075140; gated before any allocation so a hostile
+                // request fails fast.
+                if heads.heads.len() > MAX_AUTHORS_PER_SYNC {
+                    return WorkerResponse::Denied {
+                        reason: format!(
+                            "too many heads in sync request: {} > {}",
+                            heads.heads.len(),
+                            MAX_AUTHORS_PER_SYNC
+                        ),
+                    };
+                }
+
                 let data = match self.servers.get(&server_id) {
                     Some(d) => d,
                     None => {
@@ -1458,6 +1474,80 @@ mod tests {
         }
     }
 
+    // ── Issue #514: oversize HeadsSummary rejection ──────────────────────
+    //
+    // Mirrors the storage cap added by PR #507 / b075140. Without a guard
+    // here, a malicious peer could send a multi-thousand-entry HeadsSummary
+    // and force replay to do per-author BTreeMap inserts and DAG walks for
+    // every entry — same DoS shape as the storage path the sibling PR fixed.
+
+    /// Build a `HeadsSummary` with `n` distinct random authors. Mirrors the
+    /// helper in `crates/storage/src/store.rs` (sibling cap test infra).
+    fn heads_summary_with_authors(n: usize) -> HeadsSummary {
+        use willow_state::AuthorHead;
+        let mut heads = BTreeMap::new();
+        for _ in 0..n {
+            let id = Identity::generate();
+            heads.insert(
+                id.endpoint_id(),
+                AuthorHead {
+                    seq: 1,
+                    hash: EventHash::ZERO,
+                },
+            );
+        }
+        HeadsSummary { heads }
+    }
+
+    /// A peer-supplied `HeadsSummary` with more than `MAX_AUTHORS_PER_SYNC`
+    /// entries must be rejected by `handle_request(Sync)` before any
+    /// per-author BTreeMap construction or DAG walk occurs.
+    #[test]
+    fn sync_request_rejects_oversize_heads() {
+        use willow_common::MAX_AUTHORS_PER_SYNC;
+        let mut role = ReplayRole::new(ReplayConfig::default());
+        let (_, _) = setup_server(&mut role, "srv-1");
+
+        let oversize = heads_summary_with_authors(MAX_AUTHORS_PER_SYNC + 1);
+        let resp = role.handle_request(WorkerRequest::Sync {
+            server_id: "srv-1".to_string(),
+            heads: oversize,
+        });
+
+        match resp {
+            WorkerResponse::Denied { reason } => {
+                assert!(
+                    reason.contains("too many heads"),
+                    "denial reason should mention the cap; got: {reason}"
+                );
+            }
+            other => panic!("expected Denied for oversize heads, got: {other:?}"),
+        }
+    }
+
+    /// `handle_request(Sync)` must accept exactly `MAX_AUTHORS_PER_SYNC`
+    /// entries — the cap is inclusive on the legal side.
+    #[test]
+    fn sync_request_accepts_exact_cap_heads() {
+        use willow_common::MAX_AUTHORS_PER_SYNC;
+        let mut role = ReplayRole::new(ReplayConfig::default());
+        let (_, _) = setup_server(&mut role, "srv-1");
+
+        let at_cap = heads_summary_with_authors(MAX_AUTHORS_PER_SYNC);
+        let resp = role.handle_request(WorkerRequest::Sync {
+            server_id: "srv-1".to_string(),
+            heads: at_cap,
+        });
+
+        // The peer's heads mention authors we don't know, so events_since
+        // returns the genesis event; our store has 1 author the peer doesn't,
+        // so they_are_behind is true. Either branch is acceptable — the only
+        // forbidden outcome is Denied for at-cap input.
+        if let WorkerResponse::Denied { reason } = resp {
+            panic!("at-cap heads must not be denied; got reason: {reason}");
+        }
+    }
+
     /// When the configured `pending_max_entries` is exceeded, the oldest
     /// entries are evicted and `pending_count()` reflects the cap.
     #[test]

diff --git a/crates/storage/src/store.rs b/crates/storage/src/store.rs
@@ -20,25 +20,9 @@
 //! never reordered or rewritten so existing databases stay consistent.
 
 use rusqlite::{params, Connection};
-use willow_common::SYNC_BATCH_LIMIT;
+use willow_common::{MAX_AUTHORS_PER_SYNC, SYNC_BATCH_LIMIT};
 use willow_state::{Event, EventKind, HeadsSummary};
 
-/// Maximum number of `(author, head)` entries accepted from a peer-supplied
-/// [`HeadsSummary`] in a single `sync_since` / `history` call.
-///
-/// `sync_since` and `history` build their SQL clauses by concatenating one
-/// `(author = ? AND seq <op> ?)` fragment per entry. A peer-supplied summary
-/// with thousands of entries (still well within the transport envelope cap)
-/// would either exceed rusqlite's bind-parameter limit (default 32766) or
-/// waste CPU compiling a giant prepared statement.
-///
-/// 256 is well above any plausible honest server's distinct-author count
-/// while keeping the bind-parameter count (~512 per call) and the SQL
-/// expression-tree depth safely below SQLite's defaults
-/// (32766 binds, depth 1000). Requests over the cap are rejected at the store layer;
-/// the caller in `role.rs` maps the error to a `WorkerResponse::Denied`.
-pub const MAX_AUTHORS_PER_SYNC: usize = 256;
-
 /// Ordered list of schema migrations. Each entry is run inside its own
 /// transaction the first time the database is opened. Once a migration is
 /// shipped, never edit or reorder it — only append new entries.

diff --git a/crates/web/src/app.rs b/crates/web/src/app.rs
@@ -922,7 +922,10 @@ pub fn App() -> impl IntoView {
 
                                     // Request mic permission SYNCHRONOUSLY in the click handler
                                     // to preserve the user gesture chain (required on mobile).
-                                    let window = web_sys::window().unwrap();
+                                    let Some(window) = web_sys::window() else {
+                                        tracing::error!("No browser window context for getUserMedia");
+                                        return;
+                                    };
                                     let navigator = window.navigator();
                                     let Ok(media_devices) = navigator.media_devices() else {
                                         tracing::error!("No media devices available");