Skip to content

refactor!: remove staging from distributed vector indexing#6269

Merged
Xuanwo merged 13 commits intomainfrom
xuanwo/remove-vector-index-staging
Mar 24, 2026
Merged

refactor!: remove staging from distributed vector indexing#6269
Xuanwo merged 13 commits intomainfrom
xuanwo/remove-vector-index-staging

Conversation

@Xuanwo
Copy link
Copy Markdown
Collaborator

@Xuanwo Xuanwo commented Mar 24, 2026

This refactors distributed vector indexing to remove the staging-root workflow and treat worker outputs as segments written directly under indices/<segment_uuid>/. Segment planning and build now operate on segments directly, and vector indexing no longer uses merge_index_metadata.

The API and docs are updated around with_segments(...) / plan_segments(...), and the focused distributed vector tests were updated to cover the new workflow.

Java's API will be affected

@github-actions
Copy link
Copy Markdown
Contributor

PR Review

Clean refactor that removes the staging-root indirection from distributed vector indexing. The API simplification (removing staging_index_uuid parameter, renaming partial_indicessegments) is well-motivated and the code deletion is substantial.

Issue: .pyi type stubs not updated

The Python type stub files still reference the old API surface:

  • python/python/lance/lance/__init__.pyi still has staging_index_uuid, with_partial_indices, and create_index_segment_builder(staging_index_uuid: str)
  • python/python/lance/lance/indices/__init__.pyi still has staging_index_uuid and partial_indices on IndexSegmentPlan

These stubs will be out of sync with the actual runtime API after this change, which will cause incorrect type checking for downstream users.

Minor: debug_assert in build_segment for empty segments

In rust/lance/src/index/vector/ivf.rs, build_segment uses debug_assert!(!segments.is_empty(), ...) but this invariant is only enforced at the IndexSegmentBuilder::plan() level. If someone calls build_segment directly with a hand-crafted plan, the debug_assert won't fire in release builds. Consider a proper Error return for robustness, or leave it if this is strictly internal.

Breaking change documentation

The merge_index_metadata for vector types now returns an explicit error. This is a public API breaking change — consider noting it in release notes or a migration guide (the doc update in distributed_indexing.md covers the new workflow well, but existing users calling merge_index_metadata("IVF_PQ") will get a runtime error with no deprecation period).

Overall this is a solid simplification. The main action item is updating the .pyi stubs.

@github-actions github-actions Bot added the java label Mar 24, 2026
@Xuanwo Xuanwo force-pushed the xuanwo/remove-vector-index-staging branch from 889b590 to 4177d82 Compare March 24, 2026 11:23
Comment thread python/python/lance/indices/__init__.py Outdated
Comment thread rust/lance/src/index/vector/ivf/v2.rs
Copy link
Copy Markdown
Contributor

@BubbleCal BubbleCal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Xuanwo Xuanwo changed the title refactor: remove staging from distributed vector indexing refactor!: remove staging from distributed vector indexing Mar 24, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 24, 2026

Codecov Report

❌ Patch coverage is 91.71975% with 26 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance/src/index/vector/ivf.rs 83.87% 5 Missing and 5 partials ⚠️
rust/lance/src/dataset.rs 0.00% 6 Missing ⚠️
rust/lance/src/index/create.rs 95.32% 5 Missing ⚠️
rust/lance/src/index.rs 94.20% 3 Missing and 1 partial ⚠️
rust/lance/src/index/vector.rs 0.00% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@Xuanwo Xuanwo merged commit 66404ca into main Mar 24, 2026
28 checks passed
@Xuanwo Xuanwo deleted the xuanwo/remove-vector-index-staging branch March 24, 2026 16:03
westonpace pushed a commit that referenced this pull request Mar 24, 2026
This refactors distributed vector indexing to remove the staging-root
workflow and treat worker outputs as segments written directly under
`indices/<segment_uuid>/`. Segment planning and build now operate on
segments directly, and vector indexing no longer uses
`merge_index_metadata`.

The API and docs are updated around `with_segments(...)` /
`plan_segments(...)`, and the focused distributed vector tests were
updated to cover the new workflow.

Java's API will be affected
wjones127 pushed a commit to wjones127/lance that referenced this pull request Mar 29, 2026
…mat#6269)

This refactors distributed vector indexing to remove the staging-root
workflow and treat worker outputs as segments written directly under
`indices/<segment_uuid>/`. Segment planning and build now operate on
segments directly, and vector indexing no longer uses
`merge_index_metadata`.

The API and docs are updated around `with_segments(...)` /
`plan_segments(...)`, and the focused distributed vector tests were
updated to cover the new workflow.

Java's API will be affected
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants