fix(rebuild): batch embed calls in rebuildIndex (25h → 3h on large corpora)#504
Conversation
…rpora) rebuildIndex called `await vectorIndexAddGuarded(...)` per memory and per observation. Each call is one HTTP round-trip to the embedding provider for a single input. On a 500k-observation imported corpus against an embedding endpoint with even modest latency, that's serial 100-200ms per call = 14-28 hours of wallclock. The new non-blocking rebuild path (rohitg00#500) made this no longer block boot, but the rebuild itself still takes the same wallclock. Add `vectorIndexAddBatchGuarded()` next to the existing per-item helper, accepting an array of items and calling `provider.embedBatch()` once. For batchable endpoints (vLLM, Triton, OpenAI's `/v1/embeddings` all accept an `input` array), latency for N items is roughly the latency of a single embed because network + GPU setup amortize. Refactor `rebuildIndex` to accumulate items into a buffer and flush every REBUILD_EMBED_BATCH_SIZE (default 32). BM25 add stays per-item-synchronous; only the vector path is batched. Validated against a vLLM Qwen3-Embedding-8B endpoint: - single embed: 175ms - batch-of-32: 737ms (= 23ms/item amortized, ~7.6× speedup) - projected backfill time for 500k obs: 25h → 3h Per-item failure shape is preserved: - whole-batch network/provider error → all skipped, single warn line (vs N warns previously when the same error hit every item) - per-item dimension mismatch → that item skipped, others continue - rebuildIndex return value unchanged (count of attempted items) Override knob: - REBUILD_EMBED_BATCH_SIZE (default 32) — set lower for endpoints with small per-request input limits, higher for endpoints that prefer larger batches. Set to 1 to fall back to the per-item path. 39/39 existing tests in search-index/vector-index/remember-bm25-index pass unchanged. Related: rohitg00#500 (non-blocking rebuildIndex), rohitg00#503 (separate embedding base URL).
|
@efenex is attempting to deploy a commit to the rohitg00's projects Team on Vercel. A member of the Team first needs to authorize it. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughrebuildIndex now buffers memory and observation embed jobs into configurable batches and writes vectors using a new vectorIndexAddBatchGuarded helper that calls EmbeddingProvider.embedBatch once per batch, validates counts and per-item dimensions, soft-fails on errors, and returns aggregated { ok, fail }. ChangesBatched Embedding for Index Rebuild
Sequence DiagramsequenceDiagram
participant rebuildIndex
participant vectorIndexAddBatchGuarded
participant EmbeddingProvider
participant vectorIndex
rebuildIndex->>rebuildIndex: enqueue EmbedJob items
rebuildIndex->>rebuildIndex: flush when batch size reached
rebuildIndex->>vectorIndexAddBatchGuarded: call with batch items
vectorIndexAddBatchGuarded->>EmbeddingProvider: embedBatch all items once
EmbeddingProvider-->>vectorIndexAddBatchGuarded: vectors for each item
vectorIndexAddBatchGuarded->>vectorIndex: write vectors per item
vectorIndexAddBatchGuarded-->>rebuildIndex: { ok, fail } counts
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly Related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/functions/search.ts (1)
89-99: ⚡ Quick winTrim added WHAT-style comments in changed blocks.
These new comments describe mechanics/intent in detail; prefer clearer naming and keep comments minimal to avoid drift.
As per coding guidelines, "Avoid code comments explaining WHAT — use clear naming instead."
Also applies to: 157-163, 184-185, 269-270
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/functions/search.ts` around lines 89 - 99, The block comment above the function vectorIndexAddBatchGuarded (and the other nearby changed comment blocks) is WHAT-style and too verbose; replace it with a short, intent-focused comment or remove it entirely and instead rely on clear naming/signature (e.g., vectorIndexAddBatchGuarded) and well-chosen variable names to convey behavior. Trim the long multi-line description to a one-line summary like "Batch-embed items and write resulting vectors; skip items on per-item dimension mismatch; log on whole-batch errors." and remove implementation mechanics/details so comments do not duplicate code.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/functions/search.ts`:
- Around line 137-153: The loop that calls vi.add for each item currently lets
any exception abort the entire batch; wrap the per-item call to vi.add(item.id,
item.sessionId, embedding) in a try/catch so individual write-time failures are
swallowed like the old guarded path: on catch, logger.warn with context (use the
same fields: item.context.kind, item.context.logId, provider ep.name) and
increment fail, then continue; only increment ok when vi.add succeeds. Ensure
the existing dimension-mismatch branch remains unchanged.
---
Nitpick comments:
In `@src/functions/search.ts`:
- Around line 89-99: The block comment above the function
vectorIndexAddBatchGuarded (and the other nearby changed comment blocks) is
WHAT-style and too verbose; replace it with a short, intent-focused comment or
remove it entirely and instead rely on clear naming/signature (e.g.,
vectorIndexAddBatchGuarded) and well-chosen variable names to convey behavior.
Trim the long multi-line description to a one-line summary like "Batch-embed
items and write resulting vectors; skip items on per-item dimension mismatch;
log on whole-batch errors." and remove implementation mechanics/details so
comments do not duplicate code.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
Restores the pre-batch soft-fail behavior — a single failing vi.add() no longer aborts the entire rebuild batch. Failures are logged and counted toward fail, just like dimension mismatches above.
|
Thanks for the catch — pushed 81c387a wrapping the per-item |
|
Merged + shipping in v0.9.21. Thanks @efenex — 25h→3h is huge for users with large corpora. The batch-flush + REBUILD_EMBED_BATCH_SIZE env knob shape is clean. |
Quality + integration wave. Bundles 11 PRs since v0.9.20: Contributor feature: - #237 OpenCode plugin with 22 auto-capture hooks (@cl0ckt0wer) Bug fixes (9): - #516 memory_recall endpoint + format/token_budget (@serhiizghama, closes #507/#440) - #461 env-file AGENTMEMORY_DROP_STALE_INDEX flag honored (@honor2030, closes #456) - #487 Windows hook path quoting (@honor2030, closes #477) - #517 viewer IME composition guard (@jonathanzhan1975) - #472 chunk large sessions for LLM context window (@efenex) - #473 surface lessons in smart-search + diagnose tally (@efenex) - #486 declare all Hermes plugin hooks (@honor2030) - #500 rebuildIndex non-blocking on boot (@efenex) - #504 batched embed in rebuildIndex (25h -> 3h) (@efenex) - #491 cli skip onboarding without tty (@honor2030) Upstream-installer revert: - #546 drop --next workaround now that iii-hq/iii#1660 shipped 1067/1067 tests pass across 95 files.
Summary
rebuildIndexcalledawait vectorIndexAddGuarded(...)per memory and per observation — each is one HTTP round-trip embedding a single input. On a real bulk-imported corpus + any non-zero network latency, the per-item serialization dominates wallclock.Add a batched helper
vectorIndexAddBatchGuarded()that callsprovider.embedBatch()once for a buffered group of items, and refactorrebuildIndexto accumulate + flush in batches ofREBUILD_EMBED_BATCH_SIZE(default 32).The numbers
Measured against a self-hosted vLLM (Qwen3-Embedding-8B) endpoint:
That's ~7.6× speedup per item. For a 500k-observation corpus, projected rebuild time drops from ~25 hours to ~3 hours.
The same benefit applies to any batchable OpenAI-compat endpoint — vLLM, Triton, OpenAI's own
/v1/embeddings, LM Studio, llama.cpp server, etc. All accept aninputarray; their providers already amortize network + GPU setup across the batch.Companion to #500
#500 made
rebuildIndexnon-blocking so the viewer + later boot steps run immediately. But the rebuild itself still took the same wallclock. This PR cuts the rebuild itself. After both, boot is fast AND the index hot-loads in a small multiple of the time it takes the provider to physically embed the corpus.Failure semantics
rebuildIndexreturn valueThe soft-fail surface (call always returns, embed errors don't abort the rebuild) is preserved item-for-item.
Configuration
Test plan
npx vitest run test/search-index.test.ts test/vector-index*.test.ts test/remember-bm25-index.test.ts— 39/39 pass unchanged.No new test added for the batching path itself (would need mocking the provider's
embedBatchand asserting buffered flush at boundaries — possible but not done in this PR; the existing tests cover thatrebuildIndexproduces the same final index, and the batched helper is a transparent optimization). Happy to add if reviewer prefers.Verified live against a real corpus + vLLM endpoint by counting vLLM's
vllm:request_success_totalbefore/after a rebuild — request count drops by a factor of ~32 as expected.Files
src/functions/search.ts— newvectorIndexAddBatchGuarded, refactorrebuildIndexto use it; +112 lines, -9 linesSummary by CodeRabbit
New Features
Improvements