Skip to content

Stack Overflow Risk: Recursive Pending Buffer Resolution + Quadratic Eviction #94

@intendednull

Description

@intendednull

Problem

Two critical performance issues in event resolution during bulk sync:

  1. Recursive Call Stack Overflow (crates/state/src/managed.rs::insert_and_apply())

    • When events arrive out of order (common in bulk sync), insert_and_apply() recursively resolves pending events
    • A 10K-event chain arriving reversed creates a 10K-deep call stack
    • Rust's default ~2MB stack can handle roughly 1000-1500 frames before overflow
  2. Quadratic Eviction Complexity (crates/state/src/sync.rs::evict_to())

    • Each loop iteration calls pending_count() which iterates all pending events
    • Net complexity: O(p²) where p = pending event count

Current Behavior

Recursive Resolution Pattern

Location: crates/state/src/managed.rs lines 115-128

// Recursively apply resolved events.
let mut all_resolved = Vec::new();
for r in resolved_from_hash {
    match self.insert_and_apply(r.clone()) {  // <-- RECURSIVE CALL
        Ok(outcome) => {
            all_resolved.push(r);
            all_resolved.extend(outcome.resolved);
        }
        Err(_) => { /* ... */ }
    }
}

When bulk sync sends 10K events in reverse order:

Quadratic Eviction

Location: crates/state/src/sync.rs lines 214-227

pub fn evict_to(&mut self, max_pending: usize) -> usize {
    let mut evicted = 0;
    while self.pending_count() > max_pending {  // <-- O(n) called in O(n) loop
        if let Some(key) = self.waiting_on_prev.keys().next().cloned() {
            if let Some(events) = self.waiting_on_prev.remove(&key) {
                evicted += events.len();
            }
        } else { break; }
    }
    evicted
}

pending_count() iterates all pending events every iteration:

pub fn pending_count(&self) -> usize {
    self.waiting_on_prev.values().map(|v| v.len()).sum()
}

10K buffer evicting to 5K: ~500K iterations total.

Impact at Scale

Stack Overflow Threshold

  • 10K-event reversed chain: guaranteed stack overflow
  • 1K-2K reversed chain: likely overflow depending on frame size
  • 500-event reversed chain: marginal risk

Eviction Performance

  • 10K buffer → 500K iterations
  • 100K buffer → 5M iterations, stalls async runtime for seconds

Realistic Trigger

Initial sync of a 10K-message server: peer responds with bulk events, topological sort may return per-author chains in any order. A reversed sub-chain triggers the recursion worst case.

Proposed Solutions (Ranked)

1. Iterative Resolution with Explicit Work Queue (Recommended)

Replace recursive insert_and_apply() with a VecDeque work queue:

pub fn insert_and_apply(&mut self, event: Event) -> Result<InsertOutcome, InsertError> {
    let mut work_queue = VecDeque::new();
    work_queue.push_back(event);
    let mut all_resolved = Vec::new();

    while let Some(current) = work_queue.pop_front() {
        match self.dag.insert(current.clone()) {
            Ok(()) => {
                // apply to state...
                let resolved = self.pending.resolve(&current.hash);
                work_queue.extend(resolved);
                all_resolved.extend(resolved);
            }
            Err(InsertError::SeqGap { .. }) | Err(InsertError::NotGenesis) => {
                self.pending.buffer_for_prev(current.prev, current);
            }
            Err(e) => return Err(e),
        }
    }
    Ok(InsertOutcome { /* ... */ })
}
  • Stack depth: O(1) regardless of chain length
  • Handles 100K-event reversed chains without overflow

2. Cache Pending Count (Combine with Solution 1)

Add pending_count_cache: Option<usize> to PendingBuffer, invalidated on insert/remove. Reduces eviction from O(n²) to O(n).

3. Batch Resolution with Bounded Depth

Resolve in fixed-size batches (e.g., 100 events), yielding between batches. Less clean than Solution 1, still bounded.

Relevant Code Locations

Issue File Function Severity
Recursive Resolution crates/state/src/managed.rs insert_and_apply() CRITICAL
Quadratic Eviction crates/state/src/sync.rs evict_to() HIGH
O(n) Counter crates/state/src/sync.rs pending_count() HIGH
Bulk Sync Receiver crates/client/src/listeners.rs SyncBatch handler CONTEXT

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions