auto-fix batch 2026-04-28-002530#457
Merged
Merged
Conversation
TokenScope::allows_resource returned true for every URI, so ReadOnly and Messaging tokens could read willow://server/join-links and harvest link_id values. link_id is a single-step bearer credential for WireMessage::JoinRequest, so any client with a low-privilege token could join a server unattended (audit finding AUD-2). Per-scope match now: Full/Admin = all URIs, ReadOnly/Messaging = all except willow://server/join-links, Custom(set) = explicit allowlist (now also gates resources, not just tools — doc updated). Tests added at lowest tier (Rust unit + agent integration): - scopes::tests::readonly_scope_denies_join_links - scopes::tests::messaging_scope_denies_join_links - scopes::tests::full_and_admin_allow_join_links - scopes::tests::custom_resource_allowlist_gates_join_links - e2e::readonly_list_resources_omits_join_links - e2e::readonly_scope_rejects_join_links_read (replaces the prior stub-closure denied_uri_rejects_with_invalid_request — now drives the gate with a real WillowMcpServer<ReadOnly>) - readonly_token_hides_tools updated to expect join-links denied while every other URI stays visible. Tradeoff: option (a) from the issue tightens Custom(set) to also gate resources, not tools only. Runner-up was leaving Custom unchanged and only filtering join-links in ReadOnly/Messaging — rejected because the issue's preferred patch makes Custom an explicit allowlist for both surfaces, which is the safer default for least-privilege tokens. Refs #436 Co-authored-by: Claude <noreply@anthropic.com>
SEC-V-08. PendingBuffer had only a global cap (5_000 in client). One chatty signer could fill 100% of the slots with unresolved-prev events and starve legitimate out-of-order events for the whole 1 h eviction window. Adds a per-author sub-cap. Default = max_entries / 50 (200 at the client cap, 100 with the state-crate default of 10_000), with a floor of 1 so tiny test caps remain usable. When an author is at the sub-cap, further events from that author are dropped without touching other authors' buckets. Bookkeeping mirrors the existing cached_count pattern: per_author_count BTreeMap<EndpointId, usize> updated on insert / resolve / evict_expired / evict_to. Warn-once-per-author via warned_full_authors set so a single offender cannot flood logs. Why a config knob and not a hardcoded constant: tests need to override the sub-cap to exercise the global eviction policy in isolation. with_per_author_cap(usize) is the explicit override; production callers stay on the derived default. Runner-up was a hardcoded constant — rejected because three existing tests would either need multi-author rewriting or magic-number tuning, and the override is cheaper to read. Tests at the lowest tier (state crate, sync.rs): * per_author_cap_isolates_one_signer * per_author_cap_defaults_to_total_over_divisor * per_author_cap_floor_is_one * per_author_cap_warns_once_per_author * per_author_cap_frees_on_resolve Adapted three existing capacity-eviction tests to set an explicit sub-cap (so they keep testing the global policy): * sync::tests::pending_buffer_auto_evicts_when_limit_exceeded * sync::tests::capacity_eviction_drops_oldest_when_exceeded * sync::tests::pending_count_reflects_both_eviction_policies * tests::pending_buffer_eviction_reduces_count_to_cap Reduced deep_pending_chain_does_not_stack_overflow chain depth from 3000 to 1500 so it fits within the 100_000-cap'd ManagedDag's per- author budget of 2000. Test still proves the iterative resolution path; capacity policy is validated separately. Refs #237 Co-authored-by: Claude <noreply@anthropic.com>
Anti-DoS caps for two attacker-controlled vectors that fan out via .clone() into per-peer state. Ref #236. Caps + tier: - MAX_EVENT_DEPS = 64. EventDag::insert rejects oversize deps with InsertError::DepsTooLong. Author-side path covered too because every event reaches `insert` before reaching state. - MAX_ENCRYPTED_KEY_BYTES = 128. EventDag::insert rejects oversize per-blob payloads inside RotateChannelKey.encrypted_keys with InsertError::EncryptedKeyTooLarge. One X25519-sealed key fits well under that. - MAX_ENCRYPTED_KEYS_OVER_MEMBERS = 4. apply_event for RotateChannelKey rejects when encrypted_keys.len() exceeds state.members.len() + epsilon. Needs state context, so it lives at apply time alongside the existing member gate. Tier choice: caps that need no state context go at the DAG insert boundary so they fire regardless of which call path produced the event. The member-count cap requires `state.members` and naturally slots into the existing apply_event Rejected branch. Runner-up: stuffing all three into Event::new. Rejected because Event::new is currently infallible and changing it cascades to ~25 production + test callsites for no extra coverage — every event already passes through dag.insert before mutating state. willow-replay matches on InsertError exhaustively; added warn arms for the two new variants. Tests (state crate, 5 added): - dag_insert_rejects_deps_over_cap - dag_insert_accepts_deps_at_cap - dag_insert_rejects_oversized_encrypted_key - apply_rotate_channel_key_rejects_excess_entries_over_member_count - apply_rotate_channel_key_accepts_at_member_count_plus_epsilon Local: cargo fmt --check, cargo clippy --workspace --all-targets -D warnings, cargo test --workspace, cargo check --target wasm32-unknown-unknown all green. https://claude.ai/code/session_01VdcwSdvqig423A5CX2fSzR Co-authored-by: Claude <noreply@anthropic.com>
Three layered caps on the TopicAnnounce listener: - MAX_TOPICS_PER_ANNOUNCE = 64 — drop announce before any per-topic work runs. Prior code iterated unbounded; a 256 KB envelope of 2-byte topics meant ~128 000 blake3 hashes per message (CPU amplification). - MAX_TOPICS_PER_SIGNER = 100 with per-signer LRU eviction. Stops one peer from monopolising the global slot table (MAX_TOPICS = 10 000) and starving legitimate clients. Hand-rolled VecDeque + HashMap refcount: insert appends, eviction pops front, refcount==0 triggers network.unsubscribe so the global slot is actually freed. - WARN_RATE_LIMIT = 60s. Replaces the once-per-session warned_full flag so operators see ongoing pressure without log spam. Applied to per-message-cap, global-cap, and per-signer-eviction warns. Refactored listener body around a pure AnnounceState struct that returns TopicActions describing what to do on the network — keeps the state machine unit-testable without driving MemNetwork at 10 000-topic scale. Tests added (relay-tier, lowest covering behaviour): - topic_announce_listener_rejects_oversized_announce — integration test: 65-topic announce dropped, sentinel announce still works. - announce_state_per_signer_lru_evicts_oldest — fills 100 topics for one signer, the 101st evicts t0 and subscribes to the new one. - announce_state_per_signer_lru_does_not_starve_other_signers — A fills its quota, B can still subscribe. - announce_state_repeat_announce_promotes_lru_no_resubscribe — LRU touch on re-announce, no network call. - announce_state_shared_topic_refcount_keeps_subscription — refcount keeps subscription alive when one signer evicts. - announce_state_rejects_at_global_cap — fills 10 000 slots across multiple signers, fresh signer's new topic rejected. - should_emit_warn_rate_limits_to_one_per_window — directly exercises the rate-limit helper. - topic_announce_listener_enforces_max_topics_cap removed (its single- announce-of-10001 setup is now blocked by the per-message cap; the global-cap behaviour is exercised at the unit tier instead). Tradeoff considered: enforcing the per-message cap in willow-common's unpack_wire would protect every consumer, but bincode does not expose inline length caps without a custom Visitor and the relay is the only production consumer; defense-in-depth at the wire layer is a follow-up if we add more consumers. Per-message cap lives in the relay listener where the load-bearing work happens. Refs #235 Co-authored-by: Claude <noreply@anthropic.com>
Replace mutable tags (`rust:latest`, `rust:slim`, `nginxinc/nginx-unprivileged:alpine`) with digest-pinned references so container builds are reproducible and resistant to upstream tag re-push or registry takeover. Pinned (verified via `docker buildx imagetools inspect` 2026-04-28): - rust:1.95-slim-bookworm @sha256:caaf9ca7acd474892186860307d6f28e51fdbc1a4eada459fcff81517cf46a36 - nginxinc/nginx-unprivileged:1.27-alpine @sha256:65e3e85dbaed8ba248841d9d58a899b6197106c23cb0ff1a132b7bfe0547e4c0 Both builder and runtime stages now use the same pinned rust slim image across relay/replay/storage/web; web runtime uses the pinned nginx unprivileged variant. Each `FROM` carries an inline comment recording the version, pin date, and bump command for traceability. Considered alternative: pinning runtime stages to `debian:bookworm-slim` (closer to true minimal base). Rejected for this PR — out of scope per issue #313, which asks only to pin the existing `FROM` lines without restructuring the multi-stage layout. Switching runtime base belongs to a separate change. SBOM stage explicitly out of scope per the issue body. Refs #313 Co-authored-by: Claude <noreply@anthropic.com>
migration insert used `.unwrap_or(0)` on `SystemTime::now` duration. clock pre-1970 means `applied_at_ms = 0` lands in schema_version, silent. rest of crate returns `anyhow::Result`, so just `?` it. only one site in the storage crate, no helper needed. test: extend `schema_version_table_exists_after_open` to read back `applied_at_ms` and assert > 0. pins behaviour against the legacy `0` fallback under normal clock. error-branch test would need time mocking the crate doesn't have, out of scope. TD-08, refs #254 Co-authored-by: Claude <noreply@anthropic.com>
…ional three lessons from batch 2026-04-28-002530: 1. coordinator must `git fetch + reset` master batch branch before each implementer dispatch — stale local state contaminates the next worktree (#451 implementer found half-applied prior work in fresh worktree). 2. when implementer's pre-flight detects upstream fix already landed (#316/#317/#318 vs PR #402), close issues + caveman-comment, do NOT include in master PR `Fixes` list — record under `## Already-Fixed` instead. 3. github webhook subscriptions for sub-PRs are informational only — implementer owns its merge gate, coordinator must not duplicate that work.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
auto-fix batch run. 6 sub-PRs squash-merged into this branch. master PR runs full CI on merge. all sub-PR fixes ride together; merge → linked issues auto-close.
Resolved
willow://server/join-linksresource (sub-PR fix(agent): readonly scope must deny join_links resource #450). per-scopeallows_resourcematch +list_resourcesfilter; in-crate unit tests +WillowMcpServer<ReadOnly>integration tests.PendingBuffer(sub-PR fix(state): per-author cap for pending buffer (SEC-V-08) #451). defaultmax_per_author = total / 50, warn-once-per-author, decrement on resolve / evict / cap-trim. SEC-V-08.RotateChannelKey.encrypted_keys+Event.depsvectors have no element caps #236 — caps onRotateChannelKey.encrypted_keys+Event.deps(sub-PR fix(state): cap RotateChannelKey + Event.deps (SEC-V-07) #452). consts:MAX_EVENT_DEPS=64,MAX_ENCRYPTED_KEY_BYTES=128,MAX_ENCRYPTED_KEYS_OVER_MEMBERS=4. deps + per-blob caps inEventDag::insert, member-count cap inapply_event. SEC-V-07.TopicAnnounce.topics: Vec<String>has no element-count cap; enables relay CPU amplification + topic-slot exhaustion #235 —TopicAnnounceper-message cap + per-signer LRU + rate-limited warn (sub-PR fix(relay): cap TopicAnnounce + per-signer slots (SEC-V-06) #453). consts:MAX_TOPICS_PER_ANNOUNCE=64,MAX_TOPICS_PER_SIGNER=100,WARN_RATE_LIMIT=60s. hand-rolled LRU viaHashMap<EndpointId, VecDeque<String>>+ global refcount → unsubscribes when refcount hits zero (closes the "topics never freed" complaint). SEC-V-06.rust:latest/rust:slim/nginx:alpinetags #313 — pin docker base images by digest (sub-PR chore(docker): pin base images by digest (DEP-02) #455). all four Dockerfiles now pinrust:1.95-slim-bookworm@sha256:caaf9ca7...46a36andnginxinc/nginx-unprivileged:1.27-alpine@sha256:65e3e85d...e4c0. digests fetched viadocker buildx imagetools inspect. DEP-02.timestamp_hint_msfallback to0on clock error during migration insert #254 —applied_at_msclock-error propagated asanyhow::Errorinstead of0fallback (sub-PR fix(storage): propagate clock error in applied_at_ms (TD-08) #456). storage-tier test extended to assertapplied_at_ms > 0. TD-08.Already-Fixed
paste 1.0.15(RUSTSEC-2024-0436) — not in CI ignore list #316 / [DEP-09] Unmaintainedproc-macro-error 0.4.12(RUSTSEC-2024-0370) — not in CI ignore list #317 / [DEP-10] Unmaintainedatomic-polyfill 1.0.3(RUSTSEC-2023-0089) — not in CI ignore list #318 — three RUSTSEC unmaintained-dep ignores. landed already onmainvia PR ci(audit): ignore unmaintained transitive advisories (#316/#317/#318) #402 (commit3c65127) before this batch started. implementer pre-flight caught it viacargo tree -i+ read of.github/workflows/ci.yml:168-170. issues closed as completed with caveman comments naming the parent crates per advisory. no sub-PR needed; not underFixeshere because already closed.Skill Evolution
three lessons folded into
.claude/skills/resolving-issues/SKILL.md(commitd7d26a9):git fetch + reset --hard origin/<master-batch>before each implementer dispatch. prior implementers' merges + the session-open commit must be the worktree base. surfaced because fix(state): per-author cap for pending buffer (SEC-V-08) #451's implementer found half-applied prior work in a fresh worktree — stale local state contaminates the next worktree.## Already-Fixedin the master PR (NOT underFixes, since the issue is already closed). surfaced by [DEP-08] Unmaintainedpaste 1.0.15(RUSTSEC-2024-0436) — not in CI ignore list #316/[DEP-09] Unmaintainedproc-macro-error 0.4.12(RUSTSEC-2024-0370) — not in CI ignore list #317/[DEP-10] Unmaintainedatomic-polyfill 1.0.3(RUSTSEC-2023-0089) — not in CI ignore list #318 vs PR ci(audit): ignore unmaintained transitive advisories (#316/#317/#318) #402.<github-webhook-activity>arrives for a sub-PR opened by an implementer, the implementer owns its merge gate. coordinator does not investigate CI / review state — that's the implementer's still-running job. acknowledge briefly + keep waiting. surfaced by every sub-PR in this run; coordinator was being prompted to duplicate work the implementer was already doing.Lessons Learned
what worked:
crates/state/src/tests.rs), relay-tier (crates/relay/src/lib.rs), agent-tier (crates/agent/tests/e2e.rs), storage-tier (crates/storage/src/store.rs) tests. zero playwright bloat; everything ran undercargo test --workspace.cargo tree -i+ reading existing CI ignore list saved a dead PR for [DEP-08] Unmaintainedpaste 1.0.15(RUSTSEC-2024-0436) — not in CI ignore list #316/[DEP-09] Unmaintainedproc-macro-error 0.4.12(RUSTSEC-2024-0370) — not in CI ignore list #317/[DEP-10] Unmaintainedatomic-polyfill 1.0.3(RUSTSEC-2023-0089) — not in CI ignore list #318 (already shipped). this is the lessons-learned guidance from general-audit lessons: 2026-04-27 (run 2) #438 (codify orchestrator pre-fetch) bearing fruit one run later.rust:latest/rust:slim/nginx:alpinetags #313 (docker) came back from a sandbox withoutdockerdaemon — butdocker buildx imagetools inspect(registry-only, no daemon) worked. lesson surfaced: registry-API is the load-bearing tool, not the docker daemon.what didn't:
cargo auditwas not run by the coordinator before dispatching the deps cluster — saved by the implementer's pre-flight, but the coordinator should have caught it earlier. partially codified by skill edit Async client + UI refactor: eliminate polling, context-based state #2; the implementer's pre-flight is now an explicit fallback.numbers:
d7d26a9RotateChannelKey.encrypted_keys+Event.depsvectors have no element caps #236, [SEC-V-06]TopicAnnounce.topics: Vec<String>has no element-count cap; enables relay CPU amplification + topic-slot exhaustion #235, [DEP-02] Docker images use unpinnedrust:latest/rust:slim/nginx:alpinetags #313, [TD-08]timestamp_hint_msfallback to0on clock error during migration insert #254)paste 1.0.15(RUSTSEC-2024-0436) — not in CI ignore list #316, [DEP-09] Unmaintainedproc-macro-error 0.4.12(RUSTSEC-2024-0370) — not in CI ignore list #317, [DEP-10] Unmaintainedatomic-polyfill 1.0.3(RUSTSEC-2023-0089) — not in CI ignore list #318)d7d26a9)next runs should pick up the deferred candidates from this queue: #309 (JoinResponse inviter binding, security audit), #327 (dead_code cluster cleanup), #320 (TODO comments cluster), and the rest of the audit-finding queue.
Generated by Claude Code