Skip to content

Search index rebuild is all-or-nothing - no incremental updates #28

@oritwoen

Description

@oritwoen

indexResources in sync-shared.ts has a binary approach to the search database: if search.db exists, skip entirely; if it doesn't (or --force nukes it), rebuild everything from scratch.

This means adding a single new issue or discussion to a package with 3000+ indexed docs triggers a full re-chunk and re-embed of the entire corpus. The embedding cache in embeddings.db helps with texts that hash identically, but chunking still runs over every document again, and any content that changed even slightly (updated docs, new issue comments) produces new hashes and new embedding work.

Concrete scenario: skilld update vue fetches 5 new issues since last sync. Instead of appending those 5 documents to the existing index, the only options are "keep the stale index" or "nuke and rebuild all 2000+ docs." For packages with large doc sets this adds minutes of unnecessary compute on every update.

An incremental path would need to:

  1. Track which document IDs are already in the db
  2. Diff incoming docs against what's stored (add new, update changed, optionally prune removed)
  3. Only chunk and embed the delta

This probably depends on what retriv exposes for incremental writes - if the driver supports insert/upsert on individual documents, the skilld side would just need the diff logic.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions