fix(boot): make rebuildIndex non-blocking so viewer + later boot steps run#500
Conversation
…s run
mem::observe's boot flow had this sequence in main():
1. registerSearchFunction / registerContextFunction / ...
(sync — completes immediately)
2. restore persisted vector index from disk
3. await rebuildIndex(kv) ← blocks here
4. bootLog "Ready" / "REST API" / "MCP surface"
5. startViewerServer(...)
6. setInterval auto-forget / lesson decay / consolidation
rebuildIndex iterates every observation across every session and AWAITS
an embedding-provider call per record. On a large corpus + a rate-limited
embedding endpoint (e.g. 100 RPM), step 3 takes hours to days.
Everything that runs AFTER it — including startViewerServer — is
silently delayed for the same duration.
Symptoms in the wild:
- http://localhost:3113/ unreachable (no listening socket on the viewer
port) even on a freshly-started server
- `agentmemory doctor` reports "viewer-unreachable"
- log floods with `vector-index add: embed failed — skipping {429: ...}`
from the still-running rebuild burning rate-limit budget
- no error message — the worker stays alive serving HTTP because
sdk.registerFunction had already completed synchronously in step 1
Fix: detach rebuildIndex with `void` + .then/.catch instead of awaiting.
The index lazily fills in over time, search degrades gracefully (BM25
keeps working immediately, vector results fill in as the embed queue
drains), and the viewer comes up in seconds.
Repro on the operator side:
1. import a sizeable jsonl corpus (`mem::replay::import-jsonl`)
2. clear the persisted vector index so rebuildIndex runs on next boot
3. restart agentmemory with EMBEDDING_PROVIDER pointed at a rate-limited
endpoint (any OpenAI-compat with low RPM)
4. observe: REST API responds on :3111, but :3113 is never bound, and
the doctor's "viewer-unreachable" check fires until the rebuild
finishes (hours-to-days for a 300+ session corpus)
The 5-second non-fix workaround was a hard kill + restart; that just
re-entered the same hang.
No tests added — main() isn't unit-tested today and wiring up a fake
slow rebuildIndex + asserting the post-rebuild boot lines run early
would need the full worker mock harness. The change is one line and
the failure mode is dramatic; visual review + integration smoke covers
the regression risk.
|
@efenex is attempting to deploy a commit to the rohitg00's projects Team on Vercel. A member of the Team first needs to authorize it. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThe PR converts the search-index rebuild path in ChangesSearch Index Non-Blocking Rebuild
Sequence Diagram(s)The change modifies internal control flow within a single function without introducing multi-component interactions or new features, so no sequence diagram is generated. Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsGit: Failed to clone repository. Please run the Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…rpora) rebuildIndex called `await vectorIndexAddGuarded(...)` per memory and per observation. Each call is one HTTP round-trip to the embedding provider for a single input. On a 500k-observation imported corpus against an embedding endpoint with even modest latency, that's serial 100-200ms per call = 14-28 hours of wallclock. The new non-blocking rebuild path (rohitg00#500) made this no longer block boot, but the rebuild itself still takes the same wallclock. Add `vectorIndexAddBatchGuarded()` next to the existing per-item helper, accepting an array of items and calling `provider.embedBatch()` once. For batchable endpoints (vLLM, Triton, OpenAI's `/v1/embeddings` all accept an `input` array), latency for N items is roughly the latency of a single embed because network + GPU setup amortize. Refactor `rebuildIndex` to accumulate items into a buffer and flush every REBUILD_EMBED_BATCH_SIZE (default 32). BM25 add stays per-item-synchronous; only the vector path is batched. Validated against a vLLM Qwen3-Embedding-8B endpoint: - single embed: 175ms - batch-of-32: 737ms (= 23ms/item amortized, ~7.6× speedup) - projected backfill time for 500k obs: 25h → 3h Per-item failure shape is preserved: - whole-batch network/provider error → all skipped, single warn line (vs N warns previously when the same error hit every item) - per-item dimension mismatch → that item skipped, others continue - rebuildIndex return value unchanged (count of attempted items) Override knob: - REBUILD_EMBED_BATCH_SIZE (default 32) — set lower for endpoints with small per-request input limits, higher for endpoints that prefer larger batches. Set to 1 to fall back to the per-item path. 39/39 existing tests in search-index/vector-index/remember-bm25-index pass unchanged. Related: rohitg00#500 (non-blocking rebuildIndex), rohitg00#503 (separate embedding base URL).
…rpora) (#504) * fix(rebuild): batch embed calls in rebuildIndex (25h → 3h on large corpora) rebuildIndex called `await vectorIndexAddGuarded(...)` per memory and per observation. Each call is one HTTP round-trip to the embedding provider for a single input. On a 500k-observation imported corpus against an embedding endpoint with even modest latency, that's serial 100-200ms per call = 14-28 hours of wallclock. The new non-blocking rebuild path (#500) made this no longer block boot, but the rebuild itself still takes the same wallclock. Add `vectorIndexAddBatchGuarded()` next to the existing per-item helper, accepting an array of items and calling `provider.embedBatch()` once. For batchable endpoints (vLLM, Triton, OpenAI's `/v1/embeddings` all accept an `input` array), latency for N items is roughly the latency of a single embed because network + GPU setup amortize. Refactor `rebuildIndex` to accumulate items into a buffer and flush every REBUILD_EMBED_BATCH_SIZE (default 32). BM25 add stays per-item-synchronous; only the vector path is batched. Validated against a vLLM Qwen3-Embedding-8B endpoint: - single embed: 175ms - batch-of-32: 737ms (= 23ms/item amortized, ~7.6× speedup) - projected backfill time for 500k obs: 25h → 3h Per-item failure shape is preserved: - whole-batch network/provider error → all skipped, single warn line (vs N warns previously when the same error hit every item) - per-item dimension mismatch → that item skipped, others continue - rebuildIndex return value unchanged (count of attempted items) Override knob: - REBUILD_EMBED_BATCH_SIZE (default 32) — set lower for endpoints with small per-request input limits, higher for endpoints that prefer larger batches. Set to 1 to fall back to the per-item path. 39/39 existing tests in search-index/vector-index/remember-bm25-index pass unchanged. Related: #500 (non-blocking rebuildIndex), #503 (separate embedding base URL). * fix(rebuild): per-item vi.add try/catch to preserve soft-fail Restores the pre-batch soft-fail behavior — a single failing vi.add() no longer aborts the entire rebuild batch. Failures are logged and counted toward fail, just like dimension mismatches above.
Quality + integration wave. Bundles 11 PRs since v0.9.20: Contributor feature: - #237 OpenCode plugin with 22 auto-capture hooks (@cl0ckt0wer) Bug fixes (9): - #516 memory_recall endpoint + format/token_budget (@serhiizghama, closes #507/#440) - #461 env-file AGENTMEMORY_DROP_STALE_INDEX flag honored (@honor2030, closes #456) - #487 Windows hook path quoting (@honor2030, closes #477) - #517 viewer IME composition guard (@jonathanzhan1975) - #472 chunk large sessions for LLM context window (@efenex) - #473 surface lessons in smart-search + diagnose tally (@efenex) - #486 declare all Hermes plugin hooks (@honor2030) - #500 rebuildIndex non-blocking on boot (@efenex) - #504 batched embed in rebuildIndex (25h -> 3h) (@efenex) - #491 cli skip onboarding without tty (@honor2030) Upstream-installer revert: - #546 drop --next workaround now that iii-hq/iii#1660 shipped 1067/1067 tests pass across 95 files.
Summary
The worker boot flow in
src/index.tsawaitsrebuildIndex(kv)before reachingstartViewerServer(and several other boot steps).rebuildIndexiterates every observation across every session and AWAITS an embedding-provider call per record. On a real corpus + a rate-limited embedding endpoint that takes hours to days, and everything that runs after it is silently delayed for the same duration.Symptom in the wild
Operator imports a sizable jsonl corpus (in our case 320 sessions / ~500k observations), restarts agentmemory with
EMBEDDING_PROVIDERpointed at any rate-limited OpenAI-compat endpoint (Novita / DeepInfra / etc., typically 100 RPM on the cheap plans), and::3111responds normally:3113is never reachable (no listening socket)agentmemory doctorreportsviewer-unreachablevector-index add: embed failed — skipping {429: ...}from the still-running rebuild burning the embedding rate limitsdk.registerFunctioncalls had already completed synchronously before the rebuild hungThe "obvious" workaround of
agentmemory stop && agentmemoryjust re-enters the same hang.Root cause
rebuildIndex(kv)(insrc/functions/search.ts) per-record awaitsvectorIndexAddGuarded(...)which calls the embedding provider. For a ~500k-observation corpus at 100 RPM = 5,000 minutes = 3.5 days. The viewer / auto-forget / lesson-decay / consolidation timers all sit behind it.Fix
Detach with
void+.then/.catch. The index lazily fills in over hours; search degrades gracefully (BM25 keeps working immediately, vector results fill in as the embed queue drains); the viewer + everything else in main() come up in seconds.Verification
Tested live on the affected corpus before and after:
Before: every restart left
lsof -ti :3113empty, doctor reportedviewer-unreachable, log showed 429s pile up indefinitely.After: viewer binds within ~5 seconds of starting, returns the full 188 KB HTML payload on
GET /. Rebuild continues in the background; vector search results improve over the following minutes.Test plan
No unit tests added.
main()isn't unit-tested today and wiring up a fake slowrebuildIndex+ asserting the post-rebuild boot lines run early would need the full worker mock harness — disproportionate to a one-line behavior change. The failure mode is dramatic enough that visual review + integration smoke covers regression risk.Files
src/index.ts— 18 insertions, 10 deletions (mostly the comment explaining the rationale)Related
Surfaced while operating an agentmemory install against a 320-session bulk-imported corpus. The 429 floods that finally pointed at this had me chasing port-conflict / stale-process explanations first (see #474). The real cause is here.
Summary by CodeRabbit