Skip to content

fix(state): per-author cap for pending buffer (SEC-V-08)#451

Merged
intendednull merged 1 commit into
auto-fix/batch-2026-04-28-002530from
auto-fix/issue-237-pending-per-author-cap
Apr 28, 2026
Merged

fix(state): per-author cap for pending buffer (SEC-V-08)#451
intendednull merged 1 commit into
auto-fix/batch-2026-04-28-002530from
auto-fix/issue-237-pending-per-author-cap

Conversation

@intendednull
Copy link
Copy Markdown
Owner

why

SEC-V-08. PendingBuffer had global cap only. One signer can fill all 5_000 slots with chain-gap events, starve legit out-of-order events for the full 1 h eviction window. Robustness, not impact, but trivial to fix.

what

  • new sub-cap max_per_author on PendingBuffer. default = max_entries / 50 (floor 1).
    • 5_000 client cap -> 100 per author.
    • 10_000 state default -> 200 per author.
  • bookkeeping mirrors cached_count: per_author_count: BTreeMap<EndpointId, usize> updated on insert / resolve / evict_expired / evict_to.
  • warn-once-per-author via warned_full_authors: BTreeSet<EndpointId>. one chatty offender cannot flood logs.
  • with_per_author_cap(usize) builder for tests/explicit overrides. production callers stay on derived default.

tradeoffs

  • runner-up: hardcode the divisor, no override knob. rejected. three existing tests covering global capacity-eviction would need multi-author rewrites or magic-number tuning. override knob is cheaper.
  • per-author drop is silent after first warn for that author. matches existing pending buffer at capacity warn cadence — debug logs can be raised if needed.

tests (state crate, lowest tier)

new:

  • per_author_cap_isolates_one_signer — attacker capped, victim untouched
  • per_author_cap_defaults_to_total_over_divisor — 5_000 / 50 = 100
  • per_author_cap_floor_is_one5 / 50 = 0 floors to 1
  • per_author_cap_warns_once_per_author — bookkeeping HashSet
  • per_author_cap_frees_on_resolve — slots freed on resolve

adapted to use explicit sub-cap so they keep testing global policy:

  • pending_buffer_auto_evicts_when_limit_exceeded
  • capacity_eviction_drops_oldest_when_exceeded
  • pending_count_reflects_both_eviction_policies
  • pending_buffer_eviction_reduces_count_to_cap

deep_pending_chain_does_not_stack_overflow chain shortened 3000 -> 1500 to stay under per-author cap of 2000. iterative-resolution coverage unchanged.

verification

local (no just on the box, ran equivalents):

  • cargo fmt --check -> clean
  • cargo clippy --workspace --all-targets -- -D warnings -> clean
  • cargo test --workspace -> all pass
    • state: 222 passed; 0 failed
    • client: 281 passed; 0 failed
  • cargo check --target wasm32-unknown-unknown (state, identity, messaging, crypto, transport, common, network, client, web) -> clean

Refs #237

https://claude.ai/code/session_01VdcwSdvqig423A5CX2fSzR


Generated by Claude Code

SEC-V-08. PendingBuffer had only a global cap (5_000 in client). One
chatty signer could fill 100% of the slots with unresolved-prev events
and starve legitimate out-of-order events for the whole 1 h eviction
window.

Adds a per-author sub-cap. Default = max_entries / 50 (200 at the
client cap, 100 with the state-crate default of 10_000), with a floor
of 1 so tiny test caps remain usable. When an author is at the
sub-cap, further events from that author are dropped without touching
other authors' buckets.

Bookkeeping mirrors the existing cached_count pattern: per_author_count
BTreeMap<EndpointId, usize> updated on insert / resolve / evict_expired
/ evict_to. Warn-once-per-author via warned_full_authors set so a
single offender cannot flood logs.

Why a config knob and not a hardcoded constant: tests need to override
the sub-cap to exercise the global eviction policy in isolation.
with_per_author_cap(usize) is the explicit override; production
callers stay on the derived default. Runner-up was a hardcoded
constant — rejected because three existing tests would either need
multi-author rewriting or magic-number tuning, and the override is
cheaper to read.

Tests at the lowest tier (state crate, sync.rs):
* per_author_cap_isolates_one_signer
* per_author_cap_defaults_to_total_over_divisor
* per_author_cap_floor_is_one
* per_author_cap_warns_once_per_author
* per_author_cap_frees_on_resolve

Adapted three existing capacity-eviction tests to set an explicit
sub-cap (so they keep testing the global policy):
* sync::tests::pending_buffer_auto_evicts_when_limit_exceeded
* sync::tests::capacity_eviction_drops_oldest_when_exceeded
* sync::tests::pending_count_reflects_both_eviction_policies
* tests::pending_buffer_eviction_reduces_count_to_cap

Reduced deep_pending_chain_does_not_stack_overflow chain depth from
3000 to 1500 so it fits within the 100_000-cap'd ManagedDag's per-
author budget of 2000. Test still proves the iterative resolution
path; capacity policy is validated separately.

Refs #237
@intendednull intendednull merged commit 5b78e61 into auto-fix/batch-2026-04-28-002530 Apr 28, 2026
intendednull pushed a commit that referenced this pull request Apr 28, 2026
…ional

three lessons from batch 2026-04-28-002530:

1. coordinator must `git fetch + reset` master batch branch before each implementer dispatch — stale local state contaminates the next worktree (#451 implementer found half-applied prior work in fresh worktree).

2. when implementer's pre-flight detects upstream fix already landed (#316/#317/#318 vs PR #402), close issues + caveman-comment, do NOT include in master PR `Fixes` list — record under `## Already-Fixed` instead.

3. github webhook subscriptions for sub-PRs are informational only — implementer owns its merge gate, coordinator must not duplicate that work.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants