Code quality review — tracking issue

## Summary

This is the tracking issue for the workspace-wide code quality review conducted on 2026-04-10. Sixteen per-crate review agents surveyed every crate in the repo, and every critical/high finding was independently verified — some with working exploit code built against the real crates.

**Verification tally:** 14 confirmed, 3 severity-downgraded, 4 agent false positives out of ~20 top-severity claims. Details in the section below.

## Verified bugs with working exploits

Two bugs were reproduced with standalone Rust binaries built against the real crates:

- **`RotateChannelKey` has no permission check.** An outsider (`mallory`) who is not a member, not an admin, and has never interacted with the server can emit a `RotateChannelKey` event and inject arbitrary encrypted-key bytes into `ServerState::channel_keys`. Confirmed with a 60-line binary against `willow-state`. See #109.
- **Ratchet-counter DoS in `open_content`.** `derive_message_key` loops from counter=1 to the attacker-controlled value, doing 2 HKDF-Expand ops per step, **before** AEAD verification. Measured: 1e6 counter = 1.0 s; u64::MAX counter ≈ 584 000 years on a single core. Any peer subscribed to a channel topic can freeze every recipient's CPU with one malformed packet. See #110.

## Priority ordering

### Critical — verified, fix first
- [ ] #109 — state: add permission check to `RotateChannelKey`
- [ ] #110 — crypto: bound `ratchet_counter` in `open_content`
- [ ] #111 — state: dedupe reactions by `(author, emoji)`
- [ ] #112 — relay: add timeouts and connection cap to bootstrap HTTP endpoint (Slowloris)
- [ ] #113 — relay: cap topic subscription set and validate topic strings
- [ ] #114 — client: fix `lock().unwrap()` poison panic vectors
- [ ] #115 — client: propagate UUID parse errors during invite join
- [ ] #116 — storage: enable WAL, set synchronous=FULL, add transactions and schema versioning
- [ ] #117 — worker: sign worker wire messages with Ed25519
- [ ] #118 — channel: make `Server::admins` and friends private
- [ ] #119 — network: implement or delete `connection_events()` placeholder

### Medium — verified, schedule after critical
- [ ] #120 — crypto: cache `derive_message_key` to avoid O(counter) replay
- [ ] #121 — messaging: replace sort-on-insert with an ordered store
- [ ] #122 — state: add `EventHash → index` map for message ops
- [ ] #123 — state: log pending-buffer evictions
- [ ] #124 — app: audit non-mpsc `let _ =` sites in `network_bridge.rs`
- [ ] #125 — app: reorder dedup vs trust check in server op handler
- [ ] #126 — identity: atomic key file write + `0600` perms + permission validation on load
- [ ] #127 — identity + crypto: zeroize `SecretKey` and `ChannelKey` on drop
- [ ] #128 — web: wrap browser-API `.unwrap()` calls in `Result` helpers
- [ ] #129 — web: fix screen-share closure leak in `CallPage`
- [ ] #130 — client: add timeouts to actor `state::select` / `state::mutate` calls

### Process
- [ ] #131 — CI: deny `let _ =` on `Result` expressions
- [ ] #132 — docs: one-page "what enforces what" authority spec

## Suggested sequencing

If picking the next two weeks of work in order:

1. **State authority + crypto DoS** — #109, #110, #111. Smallest diffs, largest security impact, and two of them have working repro cases.
2. **Relay hardening** — #112, #113. The relay is already deployed; these are pre-exploit fixes.
3. **Client panic vectors + malformed invite fallback** — #114, #115. Silent-failure class, fix together.
4. **Worker wire signatures** — #117. Largest remaining protocol gap; do it before worker topology scales.
5. **Encapsulation + authority spec** — #118 alongside #132. They address the same confusion.
6. **Storage durability** — #116. Before any real archival deployment.
7. **Mediums** — #120–#130. Schedule as capacity allows.
8. **Process hygiene** — #131 in parallel with whatever's shipping, to stop the bleeding going forward.

## Findings that did not survive verification

For the record, these were flagged by reviewers but ruled out by direct testing or re-reading:

- **Transport nested-payload DoS** — a real test binary was built; bincode 1.3 rejects a forged frame with a `u64::MAX` inner Vec length in ~150 ns with no pre-allocation. The existing `MAX_DESER_SIZE` outer cap is sufficient.
- **`SignedMessage` unbounded fields** — `SignedMessage` is always packed/unpacked through `willow_transport::pack`/`unpack`, which bounds the entire blob at 256 KB; `verify()` uses `TryInto<[u8; 32]>`/`TryInto<[u8; 64]>` which reject wrong-length keys/sigs.
- **Actor `Recipient::do_send` semantically wrong** — the dropped-oneshot pattern is documented on `addr.rs:145-147` as correct for any `M::Result` type. Handler runs, value is computed, oneshot send silently fails. That is exactly "fire and forget".
- **`PendingBuffer::evict_to` non-deterministic** — `BTreeMap::keys().next()` returns the smallest key in fully deterministic sorted order.
- **Network bridge silent state divergence** — the channel is `std::sync::mpsc::Sender` (unbounded), so `send()` only fails during shutdown. Downgraded to medium with narrower scope in #124.
- **`PendingBuffer` silent event loss** — `evict_to` actually returns the eviction count; only the single call site discards it. Downgraded to a log-only fix in #123.
- **Dedup-before-trust race in `handle_op`** — mechanism exists but is not exploitable with random-UUID op_ids; an attacker cannot predict future legitimate op_ids. Downgraded to a code-smell reorder in #125.

## Process recommendations

- **Stop the `let _ = Result` bleeding.** The silent-error pattern is endemic and cheap to lint. Tracked in #131.
- **Write a short "what enforces what" spec.** Several reviewers independently confused the split between `willow-channel` (data model) and `willow-state` (authority). Tracked in #132.

---

Each child issue is scoped to roughly one PR and has the affected file and line range, a fix sketch, and a test to add. Pick one off the list and go.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code quality review — tracking issue #108

Summary

Verified bugs with working exploits

Priority ordering

Critical — verified, fix first

Medium — verified, schedule after critical

Process

Suggested sequencing

Findings that did not survive verification

Process recommendations

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Code quality review — tracking issue #108

Description

Summary

Verified bugs with working exploits

Priority ordering

Critical — verified, fix first

Medium — verified, schedule after critical

Process

Suggested sequencing

Findings that did not survive verification

Process recommendations

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions