Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .claude/skills/resolving-issues/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,10 @@ Why this shape:
- if closed issue is a sub-issue of a parent, run parent close-out check (see ## Parent issue close-out) — close parent too if this was its last open child

Closing GH issues = metadata work, not code work; allowed under "coordinator never codes." Implementers only dispatch for *unresolved* issues. This pass typically clears audit lessons (already folded into skills), structural-deps follow-ups (still structural), and audit findings closed by intervening fixes — saves dispatch overhead on no-op work.

**When same-day audit-to-fix gap, expect ~zero already-fixed hits.** The sweep yields most when the audit ran days/weeks before the fix run (intervening PRs land between audit and fix). When the audit was filed earlier the same day against the current `main`, no fixes can have landed between audit and fix run by definition — the sweep correctly returns empty. Don't over-invest sweep effort in this case; one quick `git log <audit-ref>..origin/main` check on each pick's path is enough to confirm.

**Ambiguous-fix-path (audit premise real, fix is a design call) — coordinator-skip without close.** Some audit findings flag a real concern but the prescribed fix is ambiguous-by-design with ≥ 2 legitimate approaches that need a design decision the coordinator can't make unilaterally. Common shape: "audit-glob doesn't catch in-file `mod tests` — move to external test file OR update glob." Both are valid; picking either silently is wrong. **Action:** during step 6 picks, if the audit's literal fix conflicts with HEAD reality (e.g. code already satisfies the surface premise) AND the underlying concern requires a design call, **skip the issue from this run's dispatch queue without closing it**. Note it in the run-end `## Lessons Learned` so the next session has full context. Don't comment on the issue — the audit description already captures the concern; a "skipped, design call needed" comment adds noise without progress. The next run with fresh eyes / accumulated context can decide.
7. Per remaining issue, sequential, max 10 per run:
- **Pre-dispatch sync:** before spawning each implementer, in the coordinator's checkout:
```bash
Expand Down
15 changes: 13 additions & 2 deletions .github/workflows/deploy.yml
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,17 @@ jobs:

- name: Verify deployment
run: |
sleep 3
curl -sf -o /dev/null -w "Web: HTTP %{http_code}\n" https://willow.intendednull.com/
# Poll the web endpoint until it serves (max ~30s) instead of a
# blind 3s sleep — avoids flaking when the relay takes longer to
# bind and masking a silent deploy failure.
for i in $(seq 1 15); do
if curl -sf -o /dev/null -w "Web: HTTP %{http_code}\n" https://willow.intendednull.com/; then
break
fi
if [ "$i" -eq 15 ]; then
echo "verify: HTTP probe never succeeded after 15 attempts (~30s)" >&2
exit 1
fi
sleep 2
done
sshpass -p '${{ secrets.DEPLOY_PASSWORD }}' ssh -o StrictHostKeyChecking=no root@willow.intendednull.com 'systemctl is-active willow-relay'
2 changes: 1 addition & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -342,7 +342,7 @@ All shared state derived from per-author Merkle DAG of signed events. `willow-st
- **EventDag** (`crates/state/src/dag.rs`): in-memory store of all known events, indexed by `EventHash`. `EventDag::insert` validates signature, genesis, `prev`/`deps` linkage. No explicit "merge" — DAG converges as events arrive.
- **ServerState** (`crates/state/src/server.rs`): materialized state derived by walking the DAG.
- **`materialize::apply_incremental(state, event)`**: ONLY public mutation entry point. Pure. Internally calls `apply_event` + `required_permission` for authority checks.
- **Permission model**: Owner = root of trust. Fine-grained permissions (SyncProvider, ManageChannels, ManageRoles, KickMembers, SendMessages, CreateInvite, Administrator) granted via `GrantPermission` events.
- **Permission model**: Owner = root of trust. Fine-grained permissions (SyncProvider, ManageChannels, ManageRoles, SendMessages, CreateInvite) granted via `GrantPermission` events. Admin status is structurally separate — managed exclusively through `ProposedAction` + vote path, never via `GrantPermission`. Kicks are an admin-only `ProposedAction` (no granular "can kick" permission).
- **Sync** (`crates/state/src/sync.rs`): `HeadsSummary` = compact per-author DAG state for efficient sync; `PendingBuffer` holds events arriving before their `prev` chain predecessors.

### Trust Model
Expand Down
2 changes: 1 addition & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

17 changes: 13 additions & 4 deletions crates/client/src/worker_cache.rs
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,15 @@ impl WorkerCache {

/// Evict workers that haven't sent a heartbeat within the TTL.
pub fn evict_stale(&mut self) {
let cutoff = Instant::now() - self.ttl;
self.evict_stale_at(Instant::now());
}

/// Evict workers stale relative to an injected `now`.
///
/// Used by tests to remove timing dependencies; production callers should
/// use [`Self::evict_stale`].
pub(crate) fn evict_stale_at(&mut self, now: Instant) {
let cutoff = now - self.ttl;
self.workers.retain(|_, info| info.last_seen > cutoff);
}

Expand Down Expand Up @@ -171,10 +179,11 @@ mod tests {

#[test]
fn evict_stale_removes_expired() {
let mut cache = WorkerCache::new(Duration::from_millis(1));
let mut cache = WorkerCache::new(Duration::from_secs(30));
cache.update(&make_replay_announcement(gen_id(), vec!["srv-1"]));
std::thread::sleep(Duration::from_millis(10));
cache.evict_stale();
// Simulate "now" being well past the TTL boundary instead of sleeping.
let future = Instant::now() + Duration::from_secs(60);
cache.evict_stale_at(future);
assert!(cache.is_empty());
}

Expand Down
18 changes: 18 additions & 0 deletions crates/common/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,21 @@ pub use worker_types::*;
/// `willow-common` (already a dep of both crates) guarantees they stay
/// aligned at compile time.
pub const SYNC_BATCH_LIMIT: usize = 10_000;

/// Maximum number of `(author, head)` entries accepted from a peer-supplied
/// [`willow_state::HeadsSummary`] in a single sync / history call.
///
/// Single source of truth shared by:
/// - `willow-storage` — `sync_since` / `history` build SQL by concatenating
/// one `(author = ? AND seq <op> ?)` fragment per entry; an oversize
/// summary would either exceed rusqlite's bind-parameter limit
/// (default 32766) or waste CPU compiling a giant prepared statement.
/// - `willow-replay` — `handle_request(Sync)` iterates per-author into a
/// `BTreeMap` then walks the in-memory DAG; an oversize summary forces
/// O(N) work and is the same DoS shape as the storage path.
///
/// 256 is well above any plausible honest server's distinct-author count
/// while keeping bind-parameter and BTreeMap-construction costs bounded.
/// Both production sites MUST agree on this value, so the canonical
/// definition lives here alongside [`SYNC_BATCH_LIMIT`].
pub const MAX_AUTHORS_PER_SYNC: usize = 256;
1 change: 0 additions & 1 deletion crates/identity/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -20,5 +20,4 @@ getrandom_02 = { package = "getrandom", version = "0.2", features = ["js"] }
getrandom_03 = { package = "getrandom", version = "0.3", features = ["wasm_js"] }

[dev-dependencies]
tokio = { workspace = true }
tempfile = "3"
1 change: 1 addition & 0 deletions crates/replay/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ willow-worker = { path = "../worker" }
willow-state = { path = "../state" }
willow-identity = { path = "../identity" }
willow-network = { path = "../network" }
willow-common = { path = "../common" }
clap = { version = "4", features = ["derive"] }
dirs = "6"
anyhow = { workspace = true }
Expand Down
90 changes: 90 additions & 0 deletions crates/replay/src/role.rs
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
use std::collections::{BTreeMap, HashMap};

use tracing::warn;
use willow_common::MAX_AUTHORS_PER_SYNC;
use willow_state::{
apply_incremental, Event, EventDag, EventHash, EventKind, HeadsSummary, InsertError,
PendingBuffer, ServerState, Snapshot, DEFAULT_PENDING_MAX_AGE_MS, DEFAULT_PENDING_MAX_ENTRIES,
Expand Down Expand Up @@ -273,6 +274,21 @@ impl WorkerRole for ReplayRole {
fn handle_request(&mut self, req: WorkerRequest) -> WorkerResponse {
match req {
WorkerRequest::Sync { server_id, heads } => {
// Reject peer-supplied summaries that would force O(N)
// BTreeMap construction and DAG walks (see
// `MAX_AUTHORS_PER_SYNC`). Mirrors the storage cap added in
// PR #507 / b075140; gated before any allocation so a hostile
// request fails fast.
if heads.heads.len() > MAX_AUTHORS_PER_SYNC {
return WorkerResponse::Denied {
reason: format!(
"too many heads in sync request: {} > {}",
heads.heads.len(),
MAX_AUTHORS_PER_SYNC
),
};
}

let data = match self.servers.get(&server_id) {
Some(d) => d,
None => {
Expand Down Expand Up @@ -1458,6 +1474,80 @@ mod tests {
}
}

// ── Issue #514: oversize HeadsSummary rejection ──────────────────────
//
// Mirrors the storage cap added by PR #507 / b075140. Without a guard
// here, a malicious peer could send a multi-thousand-entry HeadsSummary
// and force replay to do per-author BTreeMap inserts and DAG walks for
// every entry — same DoS shape as the storage path the sibling PR fixed.

/// Build a `HeadsSummary` with `n` distinct random authors. Mirrors the
/// helper in `crates/storage/src/store.rs` (sibling cap test infra).
fn heads_summary_with_authors(n: usize) -> HeadsSummary {
use willow_state::AuthorHead;
let mut heads = BTreeMap::new();
for _ in 0..n {
let id = Identity::generate();
heads.insert(
id.endpoint_id(),
AuthorHead {
seq: 1,
hash: EventHash::ZERO,
},
);
}
HeadsSummary { heads }
}

/// A peer-supplied `HeadsSummary` with more than `MAX_AUTHORS_PER_SYNC`
/// entries must be rejected by `handle_request(Sync)` before any
/// per-author BTreeMap construction or DAG walk occurs.
#[test]
fn sync_request_rejects_oversize_heads() {
use willow_common::MAX_AUTHORS_PER_SYNC;
let mut role = ReplayRole::new(ReplayConfig::default());
let (_, _) = setup_server(&mut role, "srv-1");

let oversize = heads_summary_with_authors(MAX_AUTHORS_PER_SYNC + 1);
let resp = role.handle_request(WorkerRequest::Sync {
server_id: "srv-1".to_string(),
heads: oversize,
});

match resp {
WorkerResponse::Denied { reason } => {
assert!(
reason.contains("too many heads"),
"denial reason should mention the cap; got: {reason}"
);
}
other => panic!("expected Denied for oversize heads, got: {other:?}"),
}
}

/// `handle_request(Sync)` must accept exactly `MAX_AUTHORS_PER_SYNC`
/// entries — the cap is inclusive on the legal side.
#[test]
fn sync_request_accepts_exact_cap_heads() {
use willow_common::MAX_AUTHORS_PER_SYNC;
let mut role = ReplayRole::new(ReplayConfig::default());
let (_, _) = setup_server(&mut role, "srv-1");

let at_cap = heads_summary_with_authors(MAX_AUTHORS_PER_SYNC);
let resp = role.handle_request(WorkerRequest::Sync {
server_id: "srv-1".to_string(),
heads: at_cap,
});

// The peer's heads mention authors we don't know, so events_since
// returns the genesis event; our store has 1 author the peer doesn't,
// so they_are_behind is true. Either branch is acceptable — the only
// forbidden outcome is Denied for at-cap input.
if let WorkerResponse::Denied { reason } = resp {
panic!("at-cap heads must not be denied; got reason: {reason}");
}
}

/// When the configured `pending_max_entries` is exceeded, the oldest
/// entries are evicted and `pending_count()` reflects the cap.
#[test]
Expand Down
18 changes: 1 addition & 17 deletions crates/storage/src/store.rs
Original file line number Diff line number Diff line change
Expand Up @@ -20,25 +20,9 @@
//! never reordered or rewritten so existing databases stay consistent.

use rusqlite::{params, Connection};
use willow_common::SYNC_BATCH_LIMIT;
use willow_common::{MAX_AUTHORS_PER_SYNC, SYNC_BATCH_LIMIT};
use willow_state::{Event, EventKind, HeadsSummary};

/// Maximum number of `(author, head)` entries accepted from a peer-supplied
/// [`HeadsSummary`] in a single `sync_since` / `history` call.
///
/// `sync_since` and `history` build their SQL clauses by concatenating one
/// `(author = ? AND seq <op> ?)` fragment per entry. A peer-supplied summary
/// with thousands of entries (still well within the transport envelope cap)
/// would either exceed rusqlite's bind-parameter limit (default 32766) or
/// waste CPU compiling a giant prepared statement.
///
/// 256 is well above any plausible honest server's distinct-author count
/// while keeping the bind-parameter count (~512 per call) and the SQL
/// expression-tree depth safely below SQLite's defaults
/// (32766 binds, depth 1000). Requests over the cap are rejected at the store layer;
/// the caller in `role.rs` maps the error to a `WorkerResponse::Denied`.
pub const MAX_AUTHORS_PER_SYNC: usize = 256;

/// Ordered list of schema migrations. Each entry is run inside its own
/// transaction the first time the database is opened. Once a migration is
/// shipped, never edit or reorder it — only append new entries.
Expand Down
5 changes: 4 additions & 1 deletion crates/web/src/app.rs
Original file line number Diff line number Diff line change
Expand Up @@ -922,7 +922,10 @@ pub fn App() -> impl IntoView {

// Request mic permission SYNCHRONOUSLY in the click handler
// to preserve the user gesture chain (required on mobile).
let window = web_sys::window().unwrap();
let Some(window) = web_sys::window() else {
tracing::error!("No browser window context for getUserMedia");
return;
};
let navigator = window.navigator();
let Ok(media_devices) = navigator.media_devices() else {
tracing::error!("No media devices available");
Expand Down
Loading