auto-fix batch claude/friendly-maxwell-cCGXk 2026-05-04#598
Merged
Conversation
The audit (issue #575) flagged `verify` for using ed25519-dalek's non-strict API. Investigation: `iroh_base::PublicKey::verify`, which our standalone `verify` delegates to, already calls `verify_strict` internally (iroh-base 0.98.0 src/key.rs:134-136). The vulnerability does not exist in the deployed call chain, but the invariant is load-bearing and silently breakable by future refactors. Defense in depth: document the strict-mode dependency at the call site so anyone changing this function knows the contract, and add a regression test that constructs a non-canonical S component (top bit of byte 63 set, pushing S above the curve order ℓ) and asserts `verify` rejects it. The test pins the malleability vector closed regardless of whether the underlying iroh-base wrapper changes. Refs #575
EventDag::insert previously called event.verify() (Ed25519 +
bincode + blake3, ~50us) BEFORE the cheap structural caps
(MAX_EVENT_DEPS, MAX_ENCRYPTED_KEY_BYTES). An attacker holding any
SendMessages permission could broadcast events with pathologically
large deps; every receiver paid full sig-verify cost on each
rejected event.
Reorder so cheap O(1)/O(n) syntactic checks fire first and the
crypto verify path only runs once an event passes the structural
caps. The intent comment at the top of insert() already described
this design ("Reject at the inbound DAG boundary so over-cap events
never even reach applied_events"); this commit makes the code match
that intent.
Adds a regression test that constructs an event with deps.len() >
MAX_EVENT_DEPS and a clobbered (all-zero) signature; the test
asserts InsertError::DepsTooLong (not InvalidSignature), which
proves the structural cap short-circuits before sig-verify.
Pre-existing willow-state::tests_materialize::non_admin_set_profile_is_accepted
failure (regression from PR #505, tracked at #565) is unrelated to
this change and was already failing on HEAD.
Refs #519
Content::File carried a self-declared u64 size_bytes plus unbounded filename / mime_type strings, all peer-supplied. Until now nothing rejected a 256 KB filename or MIME, and the size field had no warning that it was attacker-declared. Add MAX_FILENAME_BYTES = 255 (POSIX NAME_MAX) and MAX_MIME_BYTES = 255 (POSIX-aligned, comfortably above RFC 6838's 127+127 type/subtype limit) and expose Content::validate / Message::validate that reject oversized values with MessageValidationError. Wire validation into InMemoryStore::insert so peer-supplied messages cannot be persisted without first clearing the structural bounds. Document size_bytes as advisory-only — UIs may display it but must not use it for any preallocation or trust decision. The natural earlier ingress point sits in client/listeners.rs, but that file is locked under PR #566; an inline NOTE in store.rs flags the follow-up so the client side can also gate validation once #566 lands. No size_bytes preallocation hazards were found in the tree. Refs #583
PR #243 added referrerpolicy="no-referrer" to external <img> tags in the chat auto-embed code path (crates/web/src/components/message.rs), signalling clear intent to render external images while protecting referrer privacy. CSP img-src 'self' data: blob: blocked them in the browser, so embeds silently failed. Add https: (HTTPS only — http: would risk a mixed-content downgrade on the page) to img-src, restoring the design intent. is_image_url() also accepts http:// schemes; the resulting CSP/filter mismatch is out-of-scope for this fix and will be tracked separately if it proves to be a real path peers exercise. Browser-side CSP enforcement was not testable in the dispatch sandbox (no Chrome/Firefox); change is verified syntactically and via cargo fmt. Existing Playwright e2e CI gate on the master branch will catch any HTML / CSP regressions on merge. Refs #584
WorkerCache called std::time::Instant::now() in a willow-client lib crate. willow-client must build for wasm32-unknown-unknown per the dual-target rule, and std::time::Instant compiles on WASM but its monotonic source is not wired through Performance.now() without explicit gating. Documented trip-hazard. Replace std::time::Instant with web_time::Instant. API-compatible drop-in: native uses std::time::Instant, WASM uses Performance.now() automatically. No behaviour change on native; correct semantics on WASM. TTL eviction continues to work on both targets (gating it out on WASM would have left dead workers in the cache, defeating TTL). Picked over cfg-gate-eviction (wrong semantics on web) and HLC-derived clock (wider scope, no benefit). Both rejected in favour of the minimal-change portable replacement. Cargo.lock churn: web-time was already a transitive dep; the only addition is willow-client's direct entry. Strictly additive. Test gap: wasm-bindgen-test is not wired into willow-client, so the wasm-target test from the audit suggestion was skipped. Coverage is provided by `cargo check --target wasm32-unknown-unknown -p willow-client` and the matching clippy gate, both of which pass and catch the original compile-time linkage concern. Refs #545
Skill say "skip files in flight" for in-flight PR overlap. Trip hazard: Cargo.lock in PR X's file list, your fix need new dep, implementer think file locked, abort. Wrong — Cargo.lock add strictly additive (your row append, theirs stay). Merge trivial. Refusing on Cargo.lock alone create infinite "wait for PR X" wait loop, block dep upgrade + wasm-fix forever. Note churn in commit body, ship the dep, let merge handle. Same logic apply to workspace Cargo.toml [workspace.dependencies] table — additive only. Coordinator brief should pre-authorise. Surface in PR #545 dispatch (web_time::Instant for WorkerCache wasm fix); coordinator brief explicit authorise the Cargo.lock churn so implementer no abort.
Commit 39a9f1d updated index.html CSP img-src to include https: (for auto-embed support per #584), but the static_assets integration test asserts the exact pre-change substring `img-src 'self' data: blob:` is present. The substring no longer appears verbatim, so CI's Test job fails. Update REQUIRED_CSP_DIRECTIVES to match the production CSP. Refs #584
#584 brief tell implementer skip cargo test, "no Rust code change, HTML only." Wrong. crates/web/tests/static_assets.rs assert exact CSP directive substring in index.html. Change CSP, substring move, test red on master-PR CI (#598 Test job failed). Cost ~10 min CI rescue dispatch + extra commit on master branch. Avoidable: even non-Rust change need cargo test on crates whose integration tests read static asset at runtime (HTML/CSS/JSON/TOML /YAML/sw.js). Skill now mandate cargo test -p <touched-crate> for any file in crates/<x>/ regardless of language. Few seconds local, save CI red + rescue dispatch + commit noise.
Owner
Author
|
fix ci plz |
The non_admin_set_profile_is_accepted test was flaky: admin's GrantPermission and alice's SetProfile were causally independent in the DAG (each on its own per-author prev chain, empty deps), so materialize()'s topo-sort between them depended on HashMap iter order. When SetProfile applied before GrantPermission, the membership gate (PR #505) rejected it and the assertion failed. Wire SetProfile's deps to the GrantPermission event hash. This forces topo-sort to apply the grant first, deterministically. Production behavior (membership gate) unchanged; this is the test-side fix from issue #565's two valid options. Refs #565
7 tasks
intendednull
pushed a commit
that referenced
this pull request
May 4, 2026
Test was flaky (#565). After PR #505's membership gate, the test depended on HashMap iter order in topo-sort: when SetProfile was visited before GrantPermission, alice failed the membership gate and the profile was rejected (`left: None, right: Some("Alice")`). Add explicit causal dep on grant.hash so topo-sort applies GrantPermission first regardless of iter order. Mirrors the fix already on PR #598 for the same test; cherry-picking it here keeps PR #599's master-PR CI green now that the human asked for the fix in this batch rather than waiting for #598 to merge. Refs #565
This was referenced May 4, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Scheduled
/resolving-issuessweep against the 2026-05-03 general-audit (master ticket #567). Five small-scope fixes landed sequentially; one follow-up filed (#597). Picked candidates that did NOT overlap with PR #566 + PR #596's in-flight files.Fixes
docs(identity): pin verify_strict invariant + test(6a765ea). Stale-audit-with-residual-gap path: audit assumedkey.verify(...)was non-strict, butiroh_base::PublicKey::verify(re-exported and used here) already callsverify_strictinternally (iroh-base-0.98.0/src/key.rs:134-136). Implementer pinned the invariant via doc comment + addedverify_rejects_non_canonical_s_componentregression test (flips top bit of byte 63 of theSscalar to push above the curve order). Defense-in-depth: any future refactor that bypasses iroh-base's wrapper must continue to use the strict primitive.fix(state): check deps cap before sig verify(a3cc10c). Security HIGH — defeats SendMessages-perm DoS where attacker forces every receiver to pay full ~50µs Ed25519 verify cost on rejected over-cap events. ReorderedEventDag::insert:MAX_EVENT_DEPS+MAX_ENCRYPTED_KEY_BYTEScheap structural caps moved BEFOREevent.verify(). Addedinsert_deps_cap_rejects_before_signature_verifyregression test (constructs over-cap event with all-zero signature, assertsInsertError::DepsTooLong— proves cap fires before sig-verify path runs). SEC-V-07 comment updated to reflect the new ordering rationale (the original intent comment already prescribed this — fix matched implementation to that intent).fix(messaging): cap Content::File filename + mime(9a700dd). AddedMAX_FILENAME_BYTES = 255(POSIX) +MAX_MIME_BYTES = 255(RFC 6838 aligned),MessageValidationErrorenum,Content::validate()+Message::validate(). Validation wired atInMemoryStore::insertin messaging crate (closest free inbound boundary; client/listeners.rs is locked under PR auto-fix batch claude/friendly-maxwell-M5xB6 2026-05-03 #566 — gap noted in code comment). Advisory-only doc comment onsize_bytes(attacker-declared; never use for preallocation). 6 new tests covering oversize rejection, boundary acceptance (255 bytes), empty values, no-op for non-File variants,Message::validatedelegation. Nosize_bytes-drivenVec::with_capacity/reservecalls found in the tree, so no follow-up needed.fix(web): allow https: img-src for auto-embed(39a9f1d) + CI rescuetest(web): track CSP img-src https: change(89f55a2). CSPimg-src 'self' data: blob:→img-src 'self' https: data: blob:. Design intent already established by PR [SEC-W-04] Peer-supplied URLs auto-embedded as<img>with no scheme/host allowlist — passive-tracking vector #243 (which addedreferrerpolicy="no-referrer"to external<img>embeds — clear positive choice to render external images while protecting referrer privacy); coordinator pre-decided narrowing eliminated the audit's two-option ambiguity. CI Test job initially failed becausecrates/web/tests/static_assets.rs:133asserts the exact pre-change CSP substring; rescue commit89f55a2updated the assertion to track the new directive. Lesson codified in skill (see ## Skill Evolution).fix(client): use web_time::Instant in WorkerCache(6846fee). Use-renamestd::time::Instant→web_time::Instantincrates/client/src/worker_cache.rs. Addedweb-time = "1"workspace dep + per-crate dep on willow-client. Coordinator-narrowed scope: audit's two options (cfg-gate eviction wasm-out vsweb_time::Instantswap) decisively answered — option B is correct because cfg-gating eviction would mean workers never expire on the web client (wrong behavior). Wasm-target runtime test skipped (wasm-bindgen-test not wired into willow-client; compile-time linkage gate viacargo check --target wasm32-unknown-unknowncovers the audit's primary concern; gap noted in commit body).Already-Fixed
None this run. The audit at #567 was filed 2026-05-03 against
main @ 6404719— the same HEAD this run started from. No fixes can have landed in the audit-to-fix gap by definition (per the prior PR #566 + #596 lessons documenting same-day audit-to-fix as the expected zero-yield case). Sweep was a one-line check, no time wasted.Parked
None. All 5 picks landed cleanly. No mid-flight aborts; no finalize-implementer rescues during dispatch. (CI rescue commit
89f55a2was a coordinator-side reaction to a master-PR CI failure, not a mid-dispatch finalize.)Filed mid-run (follow-ups, NOT closed by this PR)
[web] tighten is_image_url to https-only after CSP img-src https: change (#584). Filed by coordinator after audit F17 [security]: CSP img-src lacks https: but auto-embed renders external images #584 implementer surfaced the asymmetry:extract_urlsaccepts bothhttp://andhttps://but the new CSPimg-srconly allowshttps:. Net behaviour is fine (browser blockshttp://images via mixed-content guard + CSP — same pre/post fix), but renders dead<img>elements. Cleanup deferred because the affected file (crates/web/src/components/message.rs) is locked under PR auto-fix batch claude/friendly-maxwell-M5xB6 2026-05-03 #566.Avoided picks (overlap with PR #566 + #596 files in flight)
crates/client/src/lib.rs❌) — same defer rationale PR auto-fix batch claude/friendly-maxwell-m1efN 2026-05-03 #596 used. Carry forward.crates/web/src/{voice,event_processing}.rs+components/file_share.rs(all in PR auto-fix batch claude/friendly-maxwell-M5xB6 2026-05-03 #566 or auto-fix batch claude/friendly-maxwell-m1efN 2026-05-03 #596 ❌). Defer.crates/web/src/state.rs❌). Defer.crates/client/src/listeners.rshalf-overlap with PR auto-fix batch claude/friendly-maxwell-M5xB6 2026-05-03 #566). Partial-overlap defer.crates/actor/src/fsm.rs:415is already inside#[cfg(test)] mod tests(line 175); the audit's actual concern is that the audit-glob'!**/tests*'doesn't catch in-file test modules. Two valid fixes exist (move test code to externalcrates/actor/tests/file, OR update the audit-glob), needing a design call the coordinator can't make unilaterally. Issue stays open for the next run with fresh context..github/workflows/ci.ymlis in PR auto-fix batch claude/friendly-maxwell-m1efN 2026-05-03 #596 ❌. Doing 2/3 (deploy.yml+e2e.yml) is partial; doing all 3 conflicts. Defer until auto-fix batch claude/friendly-maxwell-m1efN 2026-05-03 #596 merges.sshpass -pwith password +root@+StrictHostKeyChecking=no#227; design call (pin vs replace). Defer.Skill Evolution
Two skill edits this run:
450a4e8 docs(skill): cargo.lock conflicts are additive— adds a sub-paragraph to Implementer Agent step 6's mechanical-call-site note. Codifies thatCargo.lock(and[workspace.dependencies]table additions) being in another open PR's file list does NOT block a fix that needs a new dep — additions are strictly additive in Cargo.lock, merge resolution is mechanical, refusing creates infinite "wait for PR X to merge" deadlocks. This run's audit F45 [robustness]: WorkerCache uses Instant::now() (native-only) in willow-client lib crate #545 dispatch surfaced the rule (web_time::Instant for WorkerCache wasm fix needed a workspace dep added while Cargo.lock was in PR auto-fix batch claude/friendly-maxwell-M5xB6 2026-05-03 #566's file list; coordinator brief explicitly authorised the additive churn so implementer didn't abort).57fddc6 docs(skill): static-asset tests need cargo test— adds a sub-paragraph to Implementer Agent step 7's local-merge-gate. Codifies that non-Rust file changes (HTML / CSS / JSON / TOML / YAML / sw.js) still needcargo test -p <touched-crate>because integration tests commonly assert on static-asset contents (e.g.crates/web/tests/static_assets.rsgrepsindex.htmlfor required CSP directives). This run's audit F17 [security]: CSP img-src lacks https: but auto-embed renders external images #584 brief told the implementer to skip cargo test on the basis "no Rust code changed"; CI Test job failed whenstatic_assets.rs:133's exact-substring assertion missed the newhttps:token. CI rescue89f55a2fixed it. Skill edit closes the loophole so future briefs don't repeat.Lessons Learned
89f55a2) was the first cross-run failure of the "skip cargo test for non-Rust changes" heuristic. Three runs (auto-fix batch claude/friendly-maxwell-M5xB6 2026-05-03 #566, auto-fix batch claude/friendly-maxwell-m1efN 2026-05-03 #596, this one) all ran briefs that occasionally said "no Rust code, skip cargo test" — first time it bit. Mechanism: integration-test crates assert on static-asset contents; flipping a CSP substring is a real Rust-test signal change even though the source touched isindex.html. Skill edit57fddc6codifies the rule. Cost was ~10 min of CI rescue dispatch + one extra commit on master.<img>with no scheme/host allowlist — passive-tracking vector #243'sreferrerpolicyestablished design intent → option (a) is correct), audit F45 [robustness]: WorkerCache uses Instant::now() (native-only) in willow-client lib crate #545 (web_time over cfg-gate, with reasoning), CI rescue (single-line constant update). Brainstorm gate intentionally quiet by design — same effective pattern as PR auto-fix batch claude/friendly-maxwell-M5xB6 2026-05-03 #566 + auto-fix batch claude/friendly-maxwell-m1efN 2026-05-03 #596.verifytoverify_strict"; implementer found iroh-base.PublicKey.verify already calls verify_strict internally. Pinned the invariant via doc comment + added regression test for non-canonical S rejection. Same shape as PR auto-fix batch claude/friendly-maxwell-M5xB6 2026-05-03 #566's audit F50 [UX]: dispatcher + voice debug-format errors into user-visible warnings #549 (audit prescribedRcfor dual-target file — structurally wrong; coordinator narrowed to lock-ok markers). Skill's "Stale-audit-with-residual-gap path" already covers this.web-timeworkspace dep; risk of "implementer aborts because Cargo.lock locked." Coordinator brief explicitly authorised additive churn upfront, implementer shipped cleanly. Skill edit450a4e8codifies the rule for future runs so the brief authorisation isn't just an in-context nudge.non_admin_set_profile_is_accepted) is intermittent — confirms PR auto-fix batch claude/friendly-maxwell-f34GI 2026-05-02 #511's original "flake" reading. First localcargo test -p willow-staterun on HEAD6404719was green (240/240). Secondcargo test --workspacerun on the same HEAD was red (239/240; the regression). Master-PR CI's first Test job was red on the regression after89f55a2. Three runs across PR auto-fix batch claude/friendly-maxwell-M5xB6 2026-05-03 #566 + auto-fix batch claude/friendly-maxwell-m1efN 2026-05-03 #596 + this one have now produced both polarities. Conclusion: the test itself is flaky — not deterministically broken since PR auto-fix batch claude/friendly-maxwell-EjeTz (2026-05-01) #505 as PR auto-fix batch claude/friendly-maxwell-M5xB6 2026-05-03 #566's coordinator concluded. PR auto-fix batch claude/friendly-maxwell-f34GI 2026-05-02 #511's original "sandbox-side flake" dismissal was probably correct; PR auto-fix batch claude/friendly-maxwell-M5xB6 2026-05-03 #566's "flipped to real" reading was an unlucky-bisect artifact. Action: the underlying flake remains tracked at test(state): non_admin_set_profile_is_accepted regression after #505 membership gate #565; this PR doesn't change that. Master-PR CI may report the regression red — known, not introduced here. The flake-vs-real polarity is now fully documented across three runs for the next session to settle.Test plan
Master-PR CI is the load-bearing gate. Each implementer ran the scoped subset locally (fmt + native clippy + native test + wasm32 check + wasm32 clippy on touched crates).
CI gates to verify on this PR:
cargo fmtcargo clippyworkspace (native + wasm32)cargo testworkspace — Test job will likely report 1 failure intermittently (willow-state::tests_materialize::non_admin_set_profile_is_accepted, the test(state): non_admin_set_profile_is_accepted regression after #505 membership gate #565 flake). NOT introduced by this PR. New regression tests this PR adds (verify-strict, deps-cap-rejection, Content::File caps) all run as native unit tests and pass independently.wasm-packbrowser tests (Firefox + geckodriver — observable on CI only)cargo audit(no advisory changes this run)trunk build(HTML CSP attribute syntax check — covers audit F17 [security]: CSP img-src lacks https: but auto-embed renders external images #584)Cargo.lock conflict expected at merge time with PR #566 (which also touches Cargo.lock). Conflict is strictly additive (this PR adds
web-timerows; PR #566's rows untouched) — mechanical resolution.non_admin_set_profile_is_accepted(#565) status: intermittent. Both polarities reproduced across this run's local runs. CI Test job on89f55a2reported it red; expected. Known pre-existing, not introduced by this PR.Generated by Claude Code