indexResources in sync-shared.ts has a binary approach to the search database: if search.db exists, skip entirely; if it doesn't (or --force nukes it), rebuild everything from scratch.
This means adding a single new issue or discussion to a package with 3000+ indexed docs triggers a full re-chunk and re-embed of the entire corpus. The embedding cache in embeddings.db helps with texts that hash identically, but chunking still runs over every document again, and any content that changed even slightly (updated docs, new issue comments) produces new hashes and new embedding work.
Concrete scenario: skilld update vue fetches 5 new issues since last sync. Instead of appending those 5 documents to the existing index, the only options are "keep the stale index" or "nuke and rebuild all 2000+ docs." For packages with large doc sets this adds minutes of unnecessary compute on every update.
An incremental path would need to:
- Track which document IDs are already in the db
- Diff incoming docs against what's stored (add new, update changed, optionally prune removed)
- Only chunk and embed the delta
This probably depends on what retriv exposes for incremental writes - if the driver supports insert/upsert on individual documents, the skilld side would just need the diff logic.
indexResourcesin sync-shared.ts has a binary approach to the search database: ifsearch.dbexists, skip entirely; if it doesn't (or--forcenukes it), rebuild everything from scratch.This means adding a single new issue or discussion to a package with 3000+ indexed docs triggers a full re-chunk and re-embed of the entire corpus. The embedding cache in
embeddings.dbhelps with texts that hash identically, but chunking still runs over every document again, and any content that changed even slightly (updated docs, new issue comments) produces new hashes and new embedding work.Concrete scenario:
skilld update vuefetches 5 new issues since last sync. Instead of appending those 5 documents to the existing index, the only options are "keep the stale index" or "nuke and rebuild all 2000+ docs." For packages with large doc sets this adds minutes of unnecessary compute on every update.An incremental path would need to:
This probably depends on what retriv exposes for incremental writes - if the driver supports insert/upsert on individual documents, the skilld side would just need the diff logic.