state: timeout + capacity eviction for PendingBuffer (#40)#157
Merged
Conversation
PendingBuffer now drops stale pending events when their predecessor never arrives (partition, permanent offline peer). Each entry carries an optional insertion timestamp; inserts that supply a wall-clock forward through `buffer_for_prev_at` also sweep expired entries first, then enforce the capacity cap. Legacy `buffer_for_prev` is retained (capacity-only) so existing callers keep working. Defaults: 10_000 max entries, 1h max age (constants exposed as `DEFAULT_PENDING_MAX_ENTRIES` / `DEFAULT_PENDING_MAX_AGE_MS`). ReplayRole wires its per-server buffer to `buffer_for_prev_at` using `SystemTime::now()` and surfaces `pending_count` via a new field on `WorkerRoleInfo::Replay` so operators can monitor backpressure. Each eviction logs at `warn!` with the event hash and (for age eviction) the age in ms. Tests at the state and replay tiers cover: - age eviction after `max_age_ms` advances - capacity eviction when `max_entries + 1` is exceeded - timestamp-less entries immune to age eviction - `pending_count()` reflects both eviction policies - `WorkerRoleInfo::Replay::pending_count` exposed correctly Closes #40 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 task
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PendingBufferso events stuck waiting for a predecessor that will never arrive (partition, offline peer) are dropped instead of accumulating forever.buffer_for_prev_at(prev, event, now_ms)evicts expired entries first, then enforces the cap. The legacybuffer_for_prevpath is retained for callers that can't supply a clock (capacity-only eviction).SystemTime::now()and surfaces the total pending count via a newpending_countfield onWorkerRoleInfo::Replayso operators can monitor backpressure.DEFAULT_PENDING_MAX_ENTRIES = 10_000,DEFAULT_PENDING_MAX_AGE_MS = 3_600_000(1h). Configurable viaReplayConfig::pending_max_entries/pending_max_age_ms.Test plan
cargo fmt --checkcargo clippy -p willow-state --all-targets -- -D warningscargo clippy -p willow-replay --all-targets -- -D warningscargo clippy -p willow-common --all-targets -- -D warningscargo test -p willow-state— 184 passed (includes 7 new tests)cargo test -p willow-replay— 30 passed (includes 2 new tests)cargo test -p willow-worker— 13 passed (unchanged)cargo check --target wasm32-unknown-unknown -p willow-stateNew coverage:
max_age_ms.max_entries + 1events are buffered.pending_count()reflects both eviction policies accurately.WorkerRoleInfo::Replay::pending_countis exposed to heartbeat/monitoring consumers.Closes #40