Problem
Two critical performance issues in event resolution during bulk sync:
-
Recursive Call Stack Overflow (crates/state/src/managed.rs::insert_and_apply())
- When events arrive out of order (common in bulk sync),
insert_and_apply() recursively resolves pending events
- A 10K-event chain arriving reversed creates a 10K-deep call stack
- Rust's default ~2MB stack can handle roughly 1000-1500 frames before overflow
-
Quadratic Eviction Complexity (crates/state/src/sync.rs::evict_to())
- Each loop iteration calls
pending_count() which iterates all pending events
- Net complexity: O(p²) where p = pending event count
Current Behavior
Recursive Resolution Pattern
Location: crates/state/src/managed.rs lines 115-128
// Recursively apply resolved events.
let mut all_resolved = Vec::new();
for r in resolved_from_hash {
match self.insert_and_apply(r.clone()) { // <-- RECURSIVE CALL
Ok(outcome) => {
all_resolved.push(r);
all_resolved.extend(outcome.resolved);
}
Err(_) => { /* ... */ }
}
}
When bulk sync sends 10K events in reverse order:
- Event 10000 arrives first → buffered (prev unknown)
- Event 9999 arrives → buffered
- ...
- Event 1 arrives → triggers resolution cascade
- Stack depth: ~10K frames × ~500-1000 bytes/frame = 5-10MB (exceeds default)
Quadratic Eviction
Location: crates/state/src/sync.rs lines 214-227
pub fn evict_to(&mut self, max_pending: usize) -> usize {
let mut evicted = 0;
while self.pending_count() > max_pending { // <-- O(n) called in O(n) loop
if let Some(key) = self.waiting_on_prev.keys().next().cloned() {
if let Some(events) = self.waiting_on_prev.remove(&key) {
evicted += events.len();
}
} else { break; }
}
evicted
}
pending_count() iterates all pending events every iteration:
pub fn pending_count(&self) -> usize {
self.waiting_on_prev.values().map(|v| v.len()).sum()
}
10K buffer evicting to 5K: ~500K iterations total.
Impact at Scale
Stack Overflow Threshold
- 10K-event reversed chain: guaranteed stack overflow
- 1K-2K reversed chain: likely overflow depending on frame size
- 500-event reversed chain: marginal risk
Eviction Performance
- 10K buffer → 500K iterations
- 100K buffer → 5M iterations, stalls async runtime for seconds
Realistic Trigger
Initial sync of a 10K-message server: peer responds with bulk events, topological sort may return per-author chains in any order. A reversed sub-chain triggers the recursion worst case.
Proposed Solutions (Ranked)
1. Iterative Resolution with Explicit Work Queue (Recommended)
Replace recursive insert_and_apply() with a VecDeque work queue:
pub fn insert_and_apply(&mut self, event: Event) -> Result<InsertOutcome, InsertError> {
let mut work_queue = VecDeque::new();
work_queue.push_back(event);
let mut all_resolved = Vec::new();
while let Some(current) = work_queue.pop_front() {
match self.dag.insert(current.clone()) {
Ok(()) => {
// apply to state...
let resolved = self.pending.resolve(¤t.hash);
work_queue.extend(resolved);
all_resolved.extend(resolved);
}
Err(InsertError::SeqGap { .. }) | Err(InsertError::NotGenesis) => {
self.pending.buffer_for_prev(current.prev, current);
}
Err(e) => return Err(e),
}
}
Ok(InsertOutcome { /* ... */ })
}
- Stack depth: O(1) regardless of chain length
- Handles 100K-event reversed chains without overflow
2. Cache Pending Count (Combine with Solution 1)
Add pending_count_cache: Option<usize> to PendingBuffer, invalidated on insert/remove. Reduces eviction from O(n²) to O(n).
3. Batch Resolution with Bounded Depth
Resolve in fixed-size batches (e.g., 100 events), yielding between batches. Less clean than Solution 1, still bounded.
Relevant Code Locations
| Issue |
File |
Function |
Severity |
| Recursive Resolution |
crates/state/src/managed.rs |
insert_and_apply() |
CRITICAL |
| Quadratic Eviction |
crates/state/src/sync.rs |
evict_to() |
HIGH |
| O(n) Counter |
crates/state/src/sync.rs |
pending_count() |
HIGH |
| Bulk Sync Receiver |
crates/client/src/listeners.rs |
SyncBatch handler |
CONTEXT |
Problem
Two critical performance issues in event resolution during bulk sync:
Recursive Call Stack Overflow (
crates/state/src/managed.rs::insert_and_apply())insert_and_apply()recursively resolves pending eventsQuadratic Eviction Complexity (
crates/state/src/sync.rs::evict_to())pending_count()which iterates all pending eventsCurrent Behavior
Recursive Resolution Pattern
Location:
crates/state/src/managed.rslines 115-128When bulk sync sends 10K events in reverse order:
Quadratic Eviction
Location:
crates/state/src/sync.rslines 214-227pending_count()iterates all pending events every iteration:10K buffer evicting to 5K: ~500K iterations total.
Impact at Scale
Stack Overflow Threshold
Eviction Performance
Realistic Trigger
Initial sync of a 10K-message server: peer responds with bulk events, topological sort may return per-author chains in any order. A reversed sub-chain triggers the recursion worst case.
Proposed Solutions (Ranked)
1. Iterative Resolution with Explicit Work Queue (Recommended)
Replace recursive
insert_and_apply()with aVecDequework queue:2. Cache Pending Count (Combine with Solution 1)
Add
pending_count_cache: Option<usize>toPendingBuffer, invalidated on insert/remove. Reduces eviction from O(n²) to O(n).3. Batch Resolution with Bounded Depth
Resolve in fixed-size batches (e.g., 100 events), yielding between batches. Less clean than Solution 1, still bounded.
Relevant Code Locations
crates/state/src/managed.rsinsert_and_apply()crates/state/src/sync.rsevict_to()crates/state/src/sync.rspending_count()crates/client/src/listeners.rs