Skip to content

feat!: incremental indexing via SPFresh#4837

Merged
BubbleCal merged 35 commits intolance-format:mainfrom
BubbleCal:spfresh
Oct 31, 2025
Merged

feat!: incremental indexing via SPFresh#4837
BubbleCal merged 35 commits intolance-format:mainfrom
BubbleCal:spfresh

Conversation

@BubbleCal
Copy link
Copy Markdown
Contributor

@BubbleCal BubbleCal commented Sep 29, 2025

This PR implements a new incremental indexing mechanism inspired by SPFresh, aiming to speed up vector index updates without requiring full reindexing.
It introduces dynamic partition split/join and reassignment to maintain index quality efficiently.

Key Changes

Partition Split & Join

  • Split triggered when partition_len > max_part_length
  • Join triggered when partition_len < min_part_length

Reassignment (LIRE protocol)

  • Reassign vectors during split/join based on centroid distance.

New Parameters

  • max_part_length, 4x target_partition_size
  • min_part_length, 25% target_partition_size
  • reassign_range, 64 according to SPFresh paper
  • max_delta_indices, TODO

change num_indices_to_merge param of OptimizeOptions

  • num_indices_to_merge is optional now, and the default is None (1 before this)
  • num_indices_to_merge is None indicates to use SPFresh LIRE protocol to automatically handle the delta indices

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
@github-actions github-actions Bot added enhancement New feature or request java labels Sep 29, 2025
@github-actions
Copy link
Copy Markdown
Contributor

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

@BubbleCal BubbleCal changed the title feat: incremental indexing via SPFresh feat: incremental indexing via SPFresh Sep 29, 2025
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
@BubbleCal BubbleCal changed the title feat: incremental indexing via SPFresh feat!: incremental indexing via SPFresh Oct 9, 2025
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Oct 9, 2025

Codecov Report

❌ Patch coverage is 81.55642% with 237 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.85%. Comparing base (f7bf882) to head (83ff4d3).

Files with missing lines Patch % Lines
rust/lance/src/index/vector/builder.rs 77.89% 96 Missing and 55 partials ⚠️
rust/lance/src/index/vector/ivf.rs 45.34% 43 Missing and 4 partials ⚠️
rust/lance-index/src/vector/ivf.rs 0.00% 21 Missing ⚠️
rust/lance/src/index/vector/ivf/v2.rs 96.35% 12 Missing and 2 partials ⚠️
rust/lance-index/src/vector/ivf/transform.rs 60.00% 0 Missing and 2 partials ⚠️
rust/lance-index/src/optimize.rs 87.50% 1 Missing ⚠️
rust/lance-index/src/vector/bq/storage.rs 98.30% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4837      +/-   ##
==========================================
+ Coverage   81.77%   81.85%   +0.08%     
==========================================
  Files         340      340              
  Lines      136952   138052    +1100     
  Branches   136952   138052    +1100     
==========================================
+ Hits       111987   113003    +1016     
- Misses      21245    21257      +12     
- Partials     3720     3792      +72     
Flag Coverage Δ
unittests 81.85% <81.55%> (+0.08%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
@BubbleCal BubbleCal closed this Oct 9, 2025
@github-actions github-actions Bot added the python label Oct 9, 2025
@BubbleCal BubbleCal reopened this Oct 10, 2025
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
@BubbleCal BubbleCal marked this pull request as ready for review October 15, 2025 09:53
/// A common usage pattern will be that, the caller can keep a large snapshot of the index of the base version,
/// and accumulate a few delta indices, then merge them into the snapshot.
pub num_indices_to_merge: usize,
pub num_indices_to_merge: Option<usize>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a breaking change?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is, say if no num_indices_to_merge provided, then the default behavior will be SPFresh (it's merge the new data with the last index before this)

Copy link
Copy Markdown
Contributor

@rpgreen rpgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this impact all existing vector indices? Should we have a way to enable it / feature flag it?

Comment thread rust/lance/src/index/vector/ivf.rs
@BubbleCal
Copy link
Copy Markdown
Contributor Author

Does this impact all existing vector indices? Should we have a way to enable it / feature flag it?

It impacts most vector indices (except V1 indices, but I think today all indices are in v3 format)
It's fine for existing code if they specified num_indices_to_merge, as I know we do specify this param.

@BubbleCal BubbleCal requested review from Xuanwo and rpgreen October 16, 2025 06:28
@BubbleCal
Copy link
Copy Markdown
Contributor Author

Will merge this after benchmark

@BubbleCal BubbleCal added the donotmerge Do not merge label Oct 17, 2025
@BubbleCal BubbleCal merged commit 2f95f34 into lance-format:main Oct 31, 2025
26 of 27 checks passed
jackye1995 pushed a commit to jackye1995/lance that referenced this pull request Jan 21, 2026
This PR implements a new incremental indexing mechanism inspired by
SPFresh, aiming to speed up vector index updates without requiring full
reindexing.
It introduces dynamic partition split/join and reassignment to maintain
index quality efficiently.

## Key Changes
### Partition Split & Join
- Split triggered when `partition_len > max_part_length`
- Join triggered when `partition_len < min_part_length`
### Reassignment (LIRE protocol)
- Reassign vectors during split/join based on centroid distance.
### New Parameters
- max_part_length, 4x target_partition_size
- min_part_length, 25% target_partition_size
- reassign_range, 64 according to SPFresh paper
- max_delta_indices, TODO
### change `num_indices_to_merge` param of `OptimizeOptions`
- `num_indices_to_merge` is optional now, and the default is `None` (`1`
before this)
- `num_indices_to_merge` is `None` indicates to use SPFresh LIRE
protocol to automatically handle the delta indices

---------

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants