Conversation
…uild When the search DB already exists, indexResources now diffs incoming docs against the stored index and only processes the delta: new docs get chunked/embedded/stored, stale docs and their chunks get removed, unchanged docs are skipped entirely. Uses node:sqlite directly to query raw chunk-level IDs from the DB (bypassing retriv's parent-ID deduplication) so exact chunk IDs can be passed to remove(). Bumps retriv to 0.12.0 for listIds() support. Resolves #28
📝 WalkthroughWalkthroughThe changes implement incremental search index updates, allowing the system to track existing indexed documents, compute diffs between incoming and stored documents, and perform targeted index modifications instead of full rebuilds. Changes
Sequence DiagramsequenceDiagram
participant SyncCmd as sync-shared command
participant ReIndex as retriv index.ts
participant Pool as retriv pool.ts
participant Worker as retriv worker
participant DB as SQLite DB
SyncCmd->>ReIndex: listIndexIds(config)
ReIndex->>DB: SELECT id FROM documents_meta
DB-->>ReIndex: existing IDs
ReIndex-->>SyncCmd: existing IDs
SyncCmd->>SyncCmd: compute diff:<br/>newDocs, removeIds
alt Up to date
SyncCmd-->>SyncCmd: emit "Search index up to date"
else Has changes
SyncCmd->>Pool: createIndexInWorker(newDocs, {removeIds})
Pool->>Worker: post WorkerMessage {removeIds, docs}
Worker->>DB: getDb(config)
Worker->>DB: db.remove(removeIds)
Worker->>DB: db.index(newDocs)
Worker-->>Pool: indexing complete
Pool-->>SyncCmd: done
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (3)
src/retriv/index.ts (1)
86-100: Consider handling the case when the DB file doesn't exist.
DatabaseSyncwill throwSQLITE_CANTOPENif the database file doesn't exist. While the caller (indexResources) guards withexistsSync(dbPath), this function's contract doesn't make that precondition explicit. If called independently without the file existing, it will throw rather than returning[].Consider adding a file existence check or documenting the precondition:
🛡️ Optional: Add defensive file check
export async function listIndexIds( config: Pick<IndexConfig, 'dbPath'>, ): Promise<string[]> { const nodeSqlite = globalThis.process?.getBuiltinModule?.('node:sqlite') as typeof import('node:sqlite') | undefined if (!nodeSqlite) return [] + const { existsSync } = await import('node:fs') + if (!existsSync(config.dbPath)) + return [] const db = new nodeSqlite.DatabaseSync(config.dbPath, { open: true, readOnly: true })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/retriv/index.ts` around lines 86 - 100, The listIndexIds function currently calls new nodeSqlite.DatabaseSync(config.dbPath) which throws if the DB file is missing; update listIndexIds to defensively handle a missing DB by checking fs.existsSync(config.dbPath) (or catching the SQLITE_CANTOPEN error) before creating DatabaseSync and return [] when the file doesn't exist or cannot be opened; reference the existing symbols listIndexIds and DatabaseSync when locating the change and ensure the finally block still closes the DB only when opened.test/unit/sync-shared.test.ts (1)
696-739: Good test coverage for incremental indexing scenarios.The tests cover the three key cases:
- Up-to-date index (no changes needed)
- New docs only (add without removals)
- Stale docs (removals including chunk IDs)
Consider adding a test for the combined case where both new docs are added and stale docs are removed simultaneously, as this is the common real-world scenario.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/unit/sync-shared.test.ts` around lines 696 - 739, Add a new unit test that covers the combined scenario where the DB exists and there are both new docs to add and stale IDs to remove: mock existsSync to true, mock listIndexIds to include existing IDs plus stale IDs (and their chunk IDs), mock resolvePkgDir as in other tests, call indexResources with docs that include an existing doc and at least one new doc, then assert createIndex was called, that the first argument contains only the new doc(s) (by id) and the second argument's removeIds lists the stale IDs (including chunk ids). Use the same helpers/mocks referenced in nearby tests: indexResources, createIndex, listIndexIds, existsSync, resolvePkgDir.src/commands/sync-shared.ts (1)
824-835: VerifylistIndexIdserror handling for corrupt/unreadable DB.
listIndexIdscan throw errors other thanSearchDepsUnavailableError(e.g.,SQLITE_CORRUPT, permission errors). These would propagate as uncaught exceptions.Consider whether these should be caught and trigger a full rebuild instead:
🛡️ Optional: Fallback to full rebuild on read errors
let existingIds: string[] try { existingIds = await listIndexIds({ dbPath }) } catch (err) { if (err instanceof SearchDepsUnavailableError) { onProgress('Search indexing skipped (native deps unavailable)') return } + // DB unreadable (corrupt, permissions, etc.) - fall back to full rebuild + onProgress('Index unreadable, rebuilding') + rmSync(dbPath, { recursive: true, force: true }) + existingIds = [] - throw err }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/commands/sync-shared.ts` around lines 824 - 835, The call to listIndexIds can throw DB read errors (e.g., SQLITE_CORRUPT, EACCES) that currently bubble up; instead of rethrowing those, detect non-SearchDepsUnavailableError exceptions inside the try/catch around listIndexIds and treat them as a recoverable read-failure by logging via onProgress (include err.message) and falling back to a full rebuild path — for example set existingIds to an empty array or flip the incremental/full-rebuild flag so the subsequent incremental update logic (which uses existingIds and dbPath) performs a full rebuild; keep the existing behavior for SearchDepsUnavailableError (still skip indexing) and do not swallow unexpected exceptions silently.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/commands/sync-shared.ts`:
- Around line 824-835: The call to listIndexIds can throw DB read errors (e.g.,
SQLITE_CORRUPT, EACCES) that currently bubble up; instead of rethrowing those,
detect non-SearchDepsUnavailableError exceptions inside the try/catch around
listIndexIds and treat them as a recoverable read-failure by logging via
onProgress (include err.message) and falling back to a full rebuild path — for
example set existingIds to an empty array or flip the incremental/full-rebuild
flag so the subsequent incremental update logic (which uses existingIds and
dbPath) performs a full rebuild; keep the existing behavior for
SearchDepsUnavailableError (still skip indexing) and do not swallow unexpected
exceptions silently.
In `@src/retriv/index.ts`:
- Around line 86-100: The listIndexIds function currently calls new
nodeSqlite.DatabaseSync(config.dbPath) which throws if the DB file is missing;
update listIndexIds to defensively handle a missing DB by checking
fs.existsSync(config.dbPath) (or catching the SQLITE_CANTOPEN error) before
creating DatabaseSync and return [] when the file doesn't exist or cannot be
opened; reference the existing symbols listIndexIds and DatabaseSync when
locating the change and ensure the finally block still closes the DB only when
opened.
In `@test/unit/sync-shared.test.ts`:
- Around line 696-739: Add a new unit test that covers the combined scenario
where the DB exists and there are both new docs to add and stale IDs to remove:
mock existsSync to true, mock listIndexIds to include existing IDs plus stale
IDs (and their chunk IDs), mock resolvePkgDir as in other tests, call
indexResources with docs that include an existing doc and at least one new doc,
then assert createIndex was called, that the first argument contains only the
new doc(s) (by id) and the second argument's removeIds lists the stale IDs
(including chunk ids). Use the same helpers/mocks referenced in nearby tests:
indexResources, createIndex, listIndexIds, existsSync, resolvePkgDir.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 85e3bd59-c515-43ca-bf20-eeaf4833a0f9
⛔ Files ignored due to path filters (1)
pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (7)
package.jsonpnpm-workspace.yamlsrc/commands/sync-shared.tssrc/retriv/index.tssrc/retriv/pool.tssrc/retriv/worker.tstest/unit/sync-shared.test.ts
🔗 Linked issue
Resolves #28
❓ Type of change
📚 Description
indexResourcespreviously had all-or-nothing behavior: if the search DB existed, it skipped entirely; if not, it rebuilt everything from scratch. Adding a single new issue to a package with 2000+ indexed docs meant either keeping a stale index or nuking and rebuilding the entire corpus.Now when the DB exists, the function diffs incoming docs against stored IDs and only processes the delta. New docs get chunked, embedded, and stored; stale docs (and their chunks) get removed; unchanged docs are skipped. Uses
node:sqlitedirectly to query raw chunk-level IDs from the DB, bypassing retriv's parent-ID deduplication so exact chunk IDs can be passed toremove(). Bumps retriv to 0.12.0 forlistIds()support.Summary by CodeRabbit
Release Notes
New Features
Dependencies