Skip to content

Code quality review — tracking issue #108

@intendednull

Description

@intendednull

Summary

This is the tracking issue for the workspace-wide code quality review conducted on 2026-04-10. Sixteen per-crate review agents surveyed every crate in the repo, and every critical/high finding was independently verified — some with working exploit code built against the real crates.

Verification tally: 14 confirmed, 3 severity-downgraded, 4 agent false positives out of ~20 top-severity claims. Details in the section below.

Verified bugs with working exploits

Two bugs were reproduced with standalone Rust binaries built against the real crates:

  • RotateChannelKey has no permission check. An outsider (mallory) who is not a member, not an admin, and has never interacted with the server can emit a RotateChannelKey event and inject arbitrary encrypted-key bytes into ServerState::channel_keys. Confirmed with a 60-line binary against willow-state. See [state] RotateChannelKey has no permission or membership check #109.
  • Ratchet-counter DoS in open_content. derive_message_key loops from counter=1 to the attacker-controlled value, doing 2 HKDF-Expand ops per step, before AEAD verification. Measured: 1e6 counter = 1.0 s; u64::MAX counter ≈ 584 000 years on a single core. Any peer subscribed to a channel topic can freeze every recipient's CPU with one malformed packet. See [crypto] Bound ratchet_counter in open_content to fix CPU DoS #110.

Priority ordering

Critical — verified, fix first

Medium — verified, schedule after critical

Process

Suggested sequencing

If picking the next two weeks of work in order:

  1. State authority + crypto DoS[state] RotateChannelKey has no permission or membership check #109, [crypto] Bound ratchet_counter in open_content to fix CPU DoS #110, [state] Reaction event does not dedupe by author #111. Smallest diffs, largest security impact, and two of them have working repro cases.
  2. Relay hardening[relay] Bootstrap HTTP endpoint has no timeout or connection cap (Slowloris) #112, [relay] Topic subscription set grows without bound #113. The relay is already deployed; these are pre-exploit fixes.
  3. Client panic vectors + malformed invite fallback[client] Fix three lock().unwrap() poison vectors on join_links #114, [client] Propagate UUID parse errors during invite join instead of silently regenerating #115. Silent-failure class, fix together.
  4. Worker wire signatures[worker] Sign WorkerWireMessage with Ed25519 like server ops #117. Largest remaining protocol gap; do it before worker topology scales.
  5. Encapsulation + authority spec[channel] Make Server::admins and other public mutable fields private #118 alongside [docs] Write one-page authority spec: what enforces what #132. They address the same confusion.
  6. Storage durability[storage] Enable WAL, synchronous=FULL, transactions, and schema versioning #116. Before any real archival deployment.
  7. Mediums[crypto] Cache derive_message_key to avoid O(counter) replay #120[client] Add timeouts to actor state::select / state::mutate calls #130. Schedule as capacity allows.
  8. Process hygiene[ci] Deny let _ = on Result expressions #131 in parallel with whatever's shipping, to stop the bleeding going forward.

Findings that did not survive verification

For the record, these were flagged by reviewers but ruled out by direct testing or re-reading:

  • Transport nested-payload DoS — a real test binary was built; bincode 1.3 rejects a forged frame with a u64::MAX inner Vec length in ~150 ns with no pre-allocation. The existing MAX_DESER_SIZE outer cap is sufficient.
  • SignedMessage unbounded fieldsSignedMessage is always packed/unpacked through willow_transport::pack/unpack, which bounds the entire blob at 256 KB; verify() uses TryInto<[u8; 32]>/TryInto<[u8; 64]> which reject wrong-length keys/sigs.
  • Actor Recipient::do_send semantically wrong — the dropped-oneshot pattern is documented on addr.rs:145-147 as correct for any M::Result type. Handler runs, value is computed, oneshot send silently fails. That is exactly "fire and forget".
  • PendingBuffer::evict_to non-deterministicBTreeMap::keys().next() returns the smallest key in fully deterministic sorted order.
  • Network bridge silent state divergence — the channel is std::sync::mpsc::Sender (unbounded), so send() only fails during shutdown. Downgraded to medium with narrower scope in [app] Audit let _ = sites in network_bridge.rs for swallowed errors #124.
  • PendingBuffer silent event lossevict_to actually returns the eviction count; only the single call site discards it. Downgraded to a log-only fix in [state] Log pending-buffer evictions instead of silently discarding #123.
  • Dedup-before-trust race in handle_op — mechanism exists but is not exploitable with random-UUID op_ids; an attacker cannot predict future legitimate op_ids. Downgraded to a code-smell reorder in [app] Reorder dedup-before-trust check in handle_op #125.

Process recommendations


Each child issue is scoped to roughly one PR and has the affected file and line range, a fix sketch, and a test to add. Pick one off the list and go.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions