feat!: incremental indexing via SPFresh#4837
Conversation
|
ACTION NEEDED The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. For details on the error please inspect the "PR Title Check" action. |
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #4837 +/- ##
==========================================
+ Coverage 81.77% 81.85% +0.08%
==========================================
Files 340 340
Lines 136952 138052 +1100
Branches 136952 138052 +1100
==========================================
+ Hits 111987 113003 +1016
- Misses 21245 21257 +12
- Partials 3720 3792 +72
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
| /// A common usage pattern will be that, the caller can keep a large snapshot of the index of the base version, | ||
| /// and accumulate a few delta indices, then merge them into the snapshot. | ||
| pub num_indices_to_merge: usize, | ||
| pub num_indices_to_merge: Option<usize>, |
There was a problem hiding this comment.
Is this a breaking change?
There was a problem hiding this comment.
Yes it is, say if no num_indices_to_merge provided, then the default behavior will be SPFresh (it's merge the new data with the last index before this)
rpgreen
left a comment
There was a problem hiding this comment.
Does this impact all existing vector indices? Should we have a way to enable it / feature flag it?
It impacts most vector indices (except V1 indices, but I think today all indices are in v3 format) |
|
Will merge this after benchmark |
This PR implements a new incremental indexing mechanism inspired by SPFresh, aiming to speed up vector index updates without requiring full reindexing. It introduces dynamic partition split/join and reassignment to maintain index quality efficiently. ## Key Changes ### Partition Split & Join - Split triggered when `partition_len > max_part_length` - Join triggered when `partition_len < min_part_length` ### Reassignment (LIRE protocol) - Reassign vectors during split/join based on centroid distance. ### New Parameters - max_part_length, 4x target_partition_size - min_part_length, 25% target_partition_size - reassign_range, 64 according to SPFresh paper - max_delta_indices, TODO ### change `num_indices_to_merge` param of `OptimizeOptions` - `num_indices_to_merge` is optional now, and the default is `None` (`1` before this) - `num_indices_to_merge` is `None` indicates to use SPFresh LIRE protocol to automatically handle the delta indices --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>
This PR implements a new incremental indexing mechanism inspired by SPFresh, aiming to speed up vector index updates without requiring full reindexing.
It introduces dynamic partition split/join and reassignment to maintain index quality efficiently.
Key Changes
Partition Split & Join
partition_len > max_part_lengthpartition_len < min_part_lengthReassignment (LIRE protocol)
New Parameters
change
num_indices_to_mergeparam ofOptimizeOptionsnum_indices_to_mergeis optional now, and the default isNone(1before this)num_indices_to_mergeisNoneindicates to use SPFresh LIRE protocol to automatically handle the delta indices