Skip to content

fix(rebuild): batch embed calls in rebuildIndex (25h → 3h on large corpora)#504

Merged
rohitg00 merged 2 commits into
rohitg00:mainfrom
efenex:fix/batch-embed-rebuild
May 19, 2026
Merged

fix(rebuild): batch embed calls in rebuildIndex (25h → 3h on large corpora)#504
rohitg00 merged 2 commits into
rohitg00:mainfrom
efenex:fix/batch-embed-rebuild

Conversation

@efenex
Copy link
Copy Markdown
Contributor

@efenex efenex commented May 18, 2026

Summary

rebuildIndex called await vectorIndexAddGuarded(...) per memory and per observation — each is one HTTP round-trip embedding a single input. On a real bulk-imported corpus + any non-zero network latency, the per-item serialization dominates wallclock.

Add a batched helper vectorIndexAddBatchGuarded() that calls provider.embedBatch() once for a buffered group of items, and refactor rebuildIndex to accumulate + flush in batches of REBUILD_EMBED_BATCH_SIZE (default 32).

The numbers

Measured against a self-hosted vLLM (Qwen3-Embedding-8B) endpoint:

call shape latency per-item
single embed 175 ms 175 ms
batch of 32 737 ms 23 ms

That's ~7.6× speedup per item. For a 500k-observation corpus, projected rebuild time drops from ~25 hours to ~3 hours.

The same benefit applies to any batchable OpenAI-compat endpoint — vLLM, Triton, OpenAI's own /v1/embeddings, LM Studio, llama.cpp server, etc. All accept an input array; their providers already amortize network + GPU setup across the batch.

Companion to #500

#500 made rebuildIndex non-blocking so the viewer + later boot steps run immediately. But the rebuild itself still took the same wallclock. This PR cuts the rebuild itself. After both, boot is fast AND the index hot-loads in a small multiple of the time it takes the provider to physically embed the corpus.

Failure semantics

failure before after
network/provider error on the call per-item warn × N single warn for the batch, N items marked failed
dimension mismatch on one record per-item warn per-item warn (item skipped, others in batch continue)
rebuildIndex return value count of attempted items unchanged

The soft-fail surface (call always returns, embed errors don't abort the rebuild) is preserved item-for-item.

Configuration

# default
REBUILD_EMBED_BATCH_SIZE=32

# small endpoints that prefer fewer items per call
REBUILD_EMBED_BATCH_SIZE=8

# fall back to per-item path (effectively pre-PR behavior)
REBUILD_EMBED_BATCH_SIZE=1

Test plan

npx vitest run test/search-index.test.ts test/vector-index*.test.ts test/remember-bm25-index.test.ts — 39/39 pass unchanged.

No new test added for the batching path itself (would need mocking the provider's embedBatch and asserting buffered flush at boundaries — possible but not done in this PR; the existing tests cover that rebuildIndex produces the same final index, and the batched helper is a transparent optimization). Happy to add if reviewer prefers.

Verified live against a real corpus + vLLM endpoint by counting vLLM's vllm:request_success_total before/after a rebuild — request count drops by a factor of ~32 as expected.

Files

  • src/functions/search.ts — new vectorIndexAddBatchGuarded, refactor rebuildIndex to use it; +112 lines, -9 lines

Summary by CodeRabbit

  • New Features

    • Batched embedding processing to speed up index rebuilds.
  • Improvements

    • Granular success/failure tracking that skips problematic items without failing entire batches.
    • Configurable rebuild batch size (default 32) for optimized memory/observation indexing.
    • Faster handling when no sessions/data to process, avoiding unnecessary work.

Review Change Stack

…rpora)

rebuildIndex called `await vectorIndexAddGuarded(...)` per memory and
per observation. Each call is one HTTP round-trip to the embedding
provider for a single input. On a 500k-observation imported corpus
against an embedding endpoint with even modest latency, that's
serial 100-200ms per call = 14-28 hours of wallclock. The new
non-blocking rebuild path (rohitg00#500) made this no longer block boot, but
the rebuild itself still takes the same wallclock.

Add `vectorIndexAddBatchGuarded()` next to the existing per-item
helper, accepting an array of items and calling `provider.embedBatch()`
once. For batchable endpoints (vLLM, Triton, OpenAI's `/v1/embeddings`
all accept an `input` array), latency for N items is roughly the
latency of a single embed because network + GPU setup amortize.

Refactor `rebuildIndex` to accumulate items into a buffer and flush
every REBUILD_EMBED_BATCH_SIZE (default 32). BM25 add stays
per-item-synchronous; only the vector path is batched.

Validated against a vLLM Qwen3-Embedding-8B endpoint:
  - single embed: 175ms
  - batch-of-32:  737ms (= 23ms/item amortized, ~7.6× speedup)
  - projected backfill time for 500k obs: 25h → 3h

Per-item failure shape is preserved:
  - whole-batch network/provider error → all skipped, single warn line
    (vs N warns previously when the same error hit every item)
  - per-item dimension mismatch → that item skipped, others continue
  - rebuildIndex return value unchanged (count of attempted items)

Override knob:
  - REBUILD_EMBED_BATCH_SIZE (default 32) — set lower for endpoints
    with small per-request input limits, higher for endpoints that
    prefer larger batches. Set to 1 to fall back to the per-item path.

39/39 existing tests in search-index/vector-index/remember-bm25-index
pass unchanged.

Related: rohitg00#500 (non-blocking rebuildIndex), rohitg00#503 (separate embedding
base URL).
@vercel
Copy link
Copy Markdown

vercel Bot commented May 18, 2026

@efenex is attempting to deploy a commit to the rohitg00's projects Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 18, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5d800c34-03c3-4339-bfde-5dcad3ab9f6f

📥 Commits

Reviewing files that changed from the base of the PR and between 5613ce7 and 81c387a.

📒 Files selected for processing (1)
  • src/functions/search.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/functions/search.ts

📝 Walkthrough

Walkthrough

rebuildIndex now buffers memory and observation embed jobs into configurable batches and writes vectors using a new vectorIndexAddBatchGuarded helper that calls EmbeddingProvider.embedBatch once per batch, validates counts and per-item dimensions, soft-fails on errors, and returns aggregated { ok, fail }.

Changes

Batched Embedding for Index Rebuild

Layer / File(s) Summary
Batched embedding helper and batch sizing
src/functions/search.ts
Exports vectorIndexAddBatchGuarded that embeds a batch in one EmbeddingProvider.embedBatch call, validates returned vector counts and per-item dimensions, attempts per-item index writes, logs/skips failures, and returns { ok, fail }. Adds DEFAULT_REBUILD_EMBED_BATCH and getRebuildEmbedBatchSize() reading REBUILD_EMBED_BATCH_SIZE.
rebuildIndex batch queuing and flush infrastructure
src/functions/search.ts
rebuildIndex defines EmbedJob, maintains a pending queue, and adds enqueue()/flush() helpers that call vectorIndexAddBatchGuarded(pending) when the queue reaches the configured batch size. Memory repopulation enqueues title + content jobs with context.kind: "memory", observation repopulation enqueues title + narrative jobs with context.kind: "observation", and remaining buffered jobs are flushed before returning (including an early "no sessions" flush).

Sequence Diagram

sequenceDiagram
  participant rebuildIndex
  participant vectorIndexAddBatchGuarded
  participant EmbeddingProvider
  participant vectorIndex
  rebuildIndex->>rebuildIndex: enqueue EmbedJob items
  rebuildIndex->>rebuildIndex: flush when batch size reached
  rebuildIndex->>vectorIndexAddBatchGuarded: call with batch items
  vectorIndexAddBatchGuarded->>EmbeddingProvider: embedBatch all items once
  EmbeddingProvider-->>vectorIndexAddBatchGuarded: vectors for each item
  vectorIndexAddBatchGuarded->>vectorIndex: write vectors per item
  vectorIndexAddBatchGuarded-->>rebuildIndex: { ok, fail } counts
Loading

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly Related PRs

  • rohitg00/agentmemory#327: Extends the vector-index live-write infrastructure; this PR replaces per-item writes with batched embedding writes and rebuildIndex batching.

Poem

🐰
I hop through queued embeds at last,
Batching memories held fast,
One call to embed, per-item care,
Softly skipping errors there,
Count the wins — the rest I past.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly and specifically describes the main change: batching embed calls in rebuildIndex with a concrete performance improvement metric (25h → 3h on large corpora).
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/functions/search.ts (1)

89-99: ⚡ Quick win

Trim added WHAT-style comments in changed blocks.

These new comments describe mechanics/intent in detail; prefer clearer naming and keep comments minimal to avoid drift.

As per coding guidelines, "Avoid code comments explaining WHAT — use clear naming instead."

Also applies to: 157-163, 184-185, 269-270

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/functions/search.ts` around lines 89 - 99, The block comment above the
function vectorIndexAddBatchGuarded (and the other nearby changed comment
blocks) is WHAT-style and too verbose; replace it with a short, intent-focused
comment or remove it entirely and instead rely on clear naming/signature (e.g.,
vectorIndexAddBatchGuarded) and well-chosen variable names to convey behavior.
Trim the long multi-line description to a one-line summary like "Batch-embed
items and write resulting vectors; skip items on per-item dimension mismatch;
log on whole-batch errors." and remove implementation mechanics/details so
comments do not duplicate code.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/functions/search.ts`:
- Around line 137-153: The loop that calls vi.add for each item currently lets
any exception abort the entire batch; wrap the per-item call to vi.add(item.id,
item.sessionId, embedding) in a try/catch so individual write-time failures are
swallowed like the old guarded path: on catch, logger.warn with context (use the
same fields: item.context.kind, item.context.logId, provider ep.name) and
increment fail, then continue; only increment ok when vi.add succeeds. Ensure
the existing dimension-mismatch branch remains unchanged.

---

Nitpick comments:
In `@src/functions/search.ts`:
- Around line 89-99: The block comment above the function
vectorIndexAddBatchGuarded (and the other nearby changed comment blocks) is
WHAT-style and too verbose; replace it with a short, intent-focused comment or
remove it entirely and instead rely on clear naming/signature (e.g.,
vectorIndexAddBatchGuarded) and well-chosen variable names to convey behavior.
Trim the long multi-line description to a one-line summary like "Batch-embed
items and write resulting vectors; skip items on per-item dimension mismatch;
log on whole-batch errors." and remove implementation mechanics/details so
comments do not duplicate code.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0e055530-d3a9-4b4e-a4b5-854ba5fb4229

📥 Commits

Reviewing files that changed from the base of the PR and between caa9f52 and 5613ce7.

📒 Files selected for processing (1)
  • src/functions/search.ts

Comment thread src/functions/search.ts
Restores the pre-batch soft-fail behavior — a single failing
vi.add() no longer aborts the entire rebuild batch. Failures
are logged and counted toward fail, just like dimension
mismatches above.
@efenex
Copy link
Copy Markdown
Contributor Author

efenex commented May 18, 2026

Thanks for the catch — pushed 81c387a wrapping the per-item vi.add(...) in try/catch inside the batch loop, mirroring your suggested diff. A single failing write now logs + increments fail and the loop continues, matching the pre-batch soft-fail semantics. The existing dimension-mismatch branch is untouched.

@rohitg00 rohitg00 mentioned this pull request May 19, 2026
@rohitg00 rohitg00 merged commit 6c2a689 into rohitg00:main May 19, 2026
1 of 2 checks passed
@rohitg00
Copy link
Copy Markdown
Owner

Merged + shipping in v0.9.21. Thanks @efenex — 25h→3h is huge for users with large corpora. The batch-flush + REBUILD_EMBED_BATCH_SIZE env knob shape is clean.

@rohitg00 rohitg00 mentioned this pull request May 19, 2026
rohitg00 added a commit that referenced this pull request May 19, 2026
Quality + integration wave. Bundles 11 PRs since v0.9.20:

Contributor feature:
- #237 OpenCode plugin with 22 auto-capture hooks (@cl0ckt0wer)

Bug fixes (9):
- #516 memory_recall endpoint + format/token_budget (@serhiizghama, closes #507/#440)
- #461 env-file AGENTMEMORY_DROP_STALE_INDEX flag honored (@honor2030, closes #456)
- #487 Windows hook path quoting (@honor2030, closes #477)
- #517 viewer IME composition guard (@jonathanzhan1975)
- #472 chunk large sessions for LLM context window (@efenex)
- #473 surface lessons in smart-search + diagnose tally (@efenex)
- #486 declare all Hermes plugin hooks (@honor2030)
- #500 rebuildIndex non-blocking on boot (@efenex)
- #504 batched embed in rebuildIndex (25h -> 3h) (@efenex)
- #491 cli skip onboarding without tty (@honor2030)

Upstream-installer revert:
- #546 drop --next workaround now that iii-hq/iii#1660 shipped

1067/1067 tests pass across 95 files.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants