fix: prevent process crash on iii-engine state::set timeout (#204)#209
fix: prevent process crash on iii-engine state::set timeout (#204)#209
Conversation
Under sustained write load (e.g. Claude Code plugin hooks across multiple agents), `IndexPersistence`'s 5s debounce flush could fire into an iii-engine queue that hadn't drained, hitting the SDK's 30s default timeout on `state::set`. The rejection escaped the `setTimeout(() => this.save(), DEBOUNCE_MS)` callback in `src/state/index-persistence.ts` because the returned promise was discarded — landing as an unhandledRejection that terminated the long-lived memory service. Reproducible at ~5–15 crashes/hour on ~1.7K observations/day workloads. Two layers of defense: 1. **`IndexPersistence` swallows kv.set rejections internally.** The scheduled-save callback now `.catch()`s the returned promise and `save()` wraps the kv.set calls in try/catch. Failures are logged via `logger.warn` with throttling (once per minute) so a queue- pressure burst doesn't spam the log. Recent index updates stay in memory and retry on the next debounce flush. 2. **Top-level `unhandledRejection` handler in `src/index.ts`.** Logs and continues for any other path we missed — defense in depth so a single SDK timeout can't take down the memory mesh. Also throttled. Adds two regression tests in `test/index-persistence.test.ts`: - scheduled save under a kv.set that rejects must not raise unhandledRejection - direct `save()` must resolve (not throw) when kv.set rejects Thanks @bunke for the painstaking writeup — the journalctl trace pointing at iii-sdk/dist/index.mjs:405 made the root cause obvious. Closes #204
|
Caution Review failedPull request was closed or merged during review 📝 WalkthroughWalkthroughThe changes add resilience to handle uncaught promise rejections from state persistence operations. A top-level rejection handler is introduced in the service entrypoint, alongside improved error handling in Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Thanks! |
Summary
Under sustained write load (Claude Code plugin hooks across multiple agents at ~75+ obs/hour), `IndexPersistence`'s 5s debounce flush could fire into an iii-engine queue that hadn't drained, hitting the SDK's 30s default timeout on `state::set`. The rejection escaped the `setTimeout(() => this.save(), DEBOUNCE_MS)` callback in `src/state/index-persistence.ts` because the returned promise was discarded — landing as an `unhandledRejection` that terminated the long-lived memory service.
Reproducible at ~5–15 crashes/hour on ~1.7K observations/day workloads (see #204 for trace).
Two layers of defense
`IndexPersistence` swallows kv.set rejections internally. Scheduled-save callback now `.catch()`s the returned promise; `save()` wraps kv.set in try/catch. Failures logged via `logger.warn` with once-per-minute throttle so queue-pressure bursts don't spam the log. Recent index updates stay in memory and retry on next debounce flush.
Top-level `unhandledRejection` handler in `src/index.ts`. Logs and continues for any other path we missed — defense in depth, also throttled.
Tests
Thanks @bunke for the painstaking writeup — the journalctl trace pointing at `iii-sdk/dist/index.mjs:405` made the root cause obvious.
Closes #204
Test plan
Summary by CodeRabbit
Bug Fixes
Tests