Skip to content

refactor: distributed vector segment build#6220

Merged
Xuanwo merged 8 commits intomainfrom
xuanwo/vector-staging-merge-internal
Mar 21, 2026
Merged

refactor: distributed vector segment build#6220
Xuanwo merged 8 commits intomainfrom
xuanwo/vector-staging-merge-internal

Conversation

@Xuanwo
Copy link
Copy Markdown
Collaborator

@Xuanwo Xuanwo commented Mar 18, 2026

This refactors distributed vector indexing into a staged segment-build pipeline and exposes the public APIs needed to integrate an external distributed build workflow with Lance. It defines the storage-level model for partial shards, staged planning, built segments, and segment commit, and documents the current distributed indexing flow.

This PR builds on the segment commit API from #6209. The main changes are organized into five commits:

Follow-up fixes:

Please review accordingly.


Guide: https://github.com/lance-format/lance/blob/xuanwo/vector-staging-merge-internal/docs/src/guide/distributed_indexing.md

@github-actions
Copy link
Copy Markdown
Contributor

PR Review: refactor: internalize vector staging merge flow

Overall: Well-structured refactor that cleanly separates planning from execution for distributed vector index finalize. Good test coverage across IVF subtypes and error cases. A few items to consider:

P1: load_fragment_bitmap_from_storage scans all partitions and all row IDs

load_fragment_bitmap_from_storage (ivf.rs ~line 2598) iterates every partition and every row ID to reconstruct the fragment bitmap. For large indices with millions of vectors, this could be very expensive during plan_staging_segments (which calls load_partial_vector_segment per shard) and also during load_vector_segment in commit_existing_index. This happens at planning time, not just at merge time.

Consider whether the fragment bitmap could be stored as a sidecar in the partial shard directory during build, or at minimum document the cost tradeoff and ensure callers are aware. For the legacy compat path (commit_existing_indexload_vector_segment) this could be a regression if the index is large.

P1: reset_final_segment_dir deletes before writing — no atomicity

merge_staging_segment_to_dir calls reset_final_segment_dir (which does remove_dir_all) on the final_dir before writing new content. If the process crashes between delete and write completion, the segment is lost. The legacy same-dir path is especially fragile: it deletes the temp dir, copies files to final_dir, then deletes the partial shards — multiple non-atomic steps.

For object stores (S3/GCS) this is somewhat inherent, but for local filesystem consider whether an atomic rename could be used instead, or at least note the non-atomic window in comments.

Minor observations

  • copy_partial_segment_contents does serial object_store.copy() per file. For cloud stores with many files per shard, parallel copies (e.g., futures::stream::iter(...).buffered(N)) would be faster.
  • plan_staging_segments processes shards sequentially in a for loop. Since each load_partial_vector_segment does I/O, these could be loaded concurrently with try_join_all or similar.
  • The chrono dependency added to lance-index is only used for IndexSegment's created_at field in the commit path (in lance/src/index.rs), not in lance-index itself. It doesn't appear to be used in types.rs. Is this dependency needed in lance-index?

Tests

Good coverage: multiple IVF subtypes (flat/pq/sq), HNSW variants, error cases for mismatched subtypes/centroids/overlapping fragments, duplicate segments, empty segments, and the legacy commit_existing_index wrapper. The parametrized rstest approach is clean.

@Xuanwo Xuanwo changed the base branch from main to xuanwo/index-segment-commit-api March 18, 2026 06:43
@Xuanwo Xuanwo force-pushed the xuanwo/vector-staging-merge-internal branch from 5c4279d to 44062e8 Compare March 18, 2026 07:23
@Xuanwo
Copy link
Copy Markdown
Collaborator Author

Xuanwo commented Mar 18, 2026

CI is not running because our base is not main. I will change that once #6209 merged.

Base automatically changed from xuanwo/index-segment-commit-api to main March 18, 2026 17:22
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 18, 2026

Copy link
Copy Markdown
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still trying to get up to speed I think. What's the difference between merging staging files into a segment and combining multiple smaller segments into a larger segment?

Comment thread rust/lance-index/src/vector/distributed/index_merger.rs Outdated
Comment thread rust/lance-index/src/vector/distributed/index_merger.rs Outdated
Comment thread rust/lance/src/index/vector/ivf/v2.rs Outdated
Comment thread rust/lance/src/index/vector/ivf.rs
Comment thread rust/lance/src/index/vector/ivf.rs
Comment thread rust/lance/src/index/vector/ivf.rs Outdated
Comment thread rust/lance/src/index/vector/ivf.rs Outdated
@Xuanwo Xuanwo force-pushed the xuanwo/vector-staging-merge-internal branch from 3c6d5a7 to 4938e8a Compare March 19, 2026 07:13
@Xuanwo
Copy link
Copy Markdown
Collaborator Author

Xuanwo commented Mar 19, 2026

Thank you @westonpace for the review! I have revisited all the concepts, aligned their naming and comments, and compiled them into a document at https://github.com/lance-format/lance/blob/xuanwo/vector-staging-merge-internal/docs/src/guide/distributed_indexing.md. I hope this will be helpful for the review.

@Xuanwo Xuanwo changed the title refactor: internalize vector staging merge flow refactor: internalize distributed vector segment build Mar 19, 2026
@westonpace
Copy link
Copy Markdown
Member

This guide is great, and super helpful. I have a few more questions but we can potentially address these in a follow-up.

How does a worker decide how many shards to create?
Why does each worker create shards and not index segments? For example, let's say I have 5 workers and each worker created 10 shards so I have 50 shards.

Are those shards complete indexes? Why not just make them segments?

Then there is no need for a staging dir. The create_index_segment_builder call would take in a list of segment UUIDs (instead of a staging dir). I don't think you'd need the call with_partial_shards (since we provide these to create_index_segment_builder. Everything else would work as planned.

@Xuanwo Xuanwo changed the title refactor: internalize distributed vector segment build refactor!: distributed vector segment build Mar 20, 2026
@Xuanwo Xuanwo force-pushed the xuanwo/vector-staging-merge-internal branch from 0950fc8 to 1ea42de Compare March 20, 2026 14:32
@Xuanwo Xuanwo force-pushed the xuanwo/vector-staging-merge-internal branch from 1ea42de to 0a3f230 Compare March 20, 2026 15:37
@Xuanwo Xuanwo force-pushed the xuanwo/vector-staging-merge-internal branch from d43183d to b86f91c Compare March 20, 2026 16:59
Copy link
Copy Markdown
Member

@westonpace westonpace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clearly document what the breaking changes are in the PR description?

Overall this seems to be a good extension to the previous single segment mechanism. It is flexible and the API makes sense.

It might be nice to see a full end-to-end example (perhaps in distributed_indexing.md) at some point.

*,
target_partition_size: Optional[int] = None,
skip_transpose: bool = False,
require_commit: bool = True,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a breaking change?

)
return index

def create_index(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to tell how this changed from the existing API. Were there any breaking changes?

@Xuanwo
Copy link
Copy Markdown
Collaborator Author

Xuanwo commented Mar 21, 2026

Can you clearly document what the breaking changes are in the PR description?

I think we should mark #6209 as a breaking change. Therefore, this PR does not necessarily need to be a breaking change now. I will update.

It might be nice to see a full end-to-end example (perhaps in distributed_indexing.md) at some point.

Yes! I'm thinking about this too, will work on a follow-up.

@Xuanwo Xuanwo changed the title refactor!: distributed vector segment build refactor: distributed vector segment build Mar 21, 2026
@Xuanwo Xuanwo merged commit b80fbb3 into main Mar 21, 2026
36 checks passed
@Xuanwo Xuanwo deleted the xuanwo/vector-staging-merge-internal branch March 21, 2026 22:28
Xuanwo pushed a commit that referenced this pull request Mar 24, 2026
## Summary
This fixes a flaky regression test added in #6220
(`b80fbb3231cf58dd50e5670f9c56d309999bbd73`).

The affected test is:
-
`test_distributed_vector_build_commits_multiple_segments_and_preserves_query_results`

Recent failures showed up in both of these runs:
- https://github.com/lance-format/lance/actions/runs/23450834811
  `main` at `244c721504c6ef0b4c2f9700a342509976898d6e`
- https://github.com/lance-format/lance/actions/runs/23460892697
   #6263
  

In those failures, different platforms / index variants failed:
- `linux-arm / case_1_ivf_flat` on `main`
- `linux-build / case_2_ivf_pq` on #6263

That points to an existing flaky test.

## Root Cause
The test compared the exact Top-K `_rowid` results between:
- a single-segment index build, and
- a distributed multi-segment index build

However, the query path used by the test is ANN (`ANNIVFPartition`)
under the default probing behavior. With partial probing, the candidate
set can differ slightly between single-segment and multi-segment
layouts, especially near the tail of Top-K. That makes exact `_rowid`
equality too strict for this test and causes intermittent failures.

## Fix
Make the test probe all IVF partitions before comparing Top-K row ids:
- add `.minimum_nprobes(TWO_FRAG_NUM_PARTITIONS)` to the test query

This keeps the existing strong assertion (`ids_single == ids_split`) but
removes the probing-related source of nondeterminism.

## Testing
Local verification:
- `export PROTOC=/opt/homebrew/opt/protobuf@3/bin/protoc`
- `cargo test -p lance
test_distributed_vector_build_commits_multiple_segments_and_preserves_query_results
-- --nocapture`

Observed result:
- `case_1_ivf_flat ... ok`
- `case_2_ivf_pq ... ok`
- `case_3_ivf_sq ... ok`

I also verified during debugging that with full probing enabled,
repeated runs of the previously flaky `ivf_flat` / `ivf_pq` cases became
stable.
westonpace pushed a commit that referenced this pull request Mar 24, 2026
This refactors distributed vector indexing into a staged segment-build
pipeline and exposes the public APIs needed to integrate an external
distributed build workflow with Lance. It defines the storage-level
model for partial shards, staged planning, built segments, and segment
commit, and documents the current distributed indexing flow.

This PR builds on the segment commit API from
#6209. The main changes are
organized into five commits:

- [test: cover distributed vector segment
build](a86274da2)
- [refactor: internalize distributed vector segment
build](1e5f0e15b)
- [feat: add public vector segment builder
API](691cecb9a)
- [feat: add Python vector segment builder
API](a07ef6144)
- [docs: document distributed vector segment
build](0a3f230d7)

Follow-up fixes:

- [fix: expose python
create_index_uncommitted](c1d3b1666)
- [fix: format python
bindings](b86f91c4d)
- [fix: format python segment builder
bindings](bfc9e63a0)

Please review accordingly.

---

Guide:
https://github.com/lance-format/lance/blob/xuanwo/vector-staging-merge-internal/docs/src/guide/distributed_indexing.md
westonpace pushed a commit that referenced this pull request Mar 24, 2026
## Summary
This fixes a flaky regression test added in #6220
(`b80fbb3231cf58dd50e5670f9c56d309999bbd73`).

The affected test is:
-
`test_distributed_vector_build_commits_multiple_segments_and_preserves_query_results`

Recent failures showed up in both of these runs:
- https://github.com/lance-format/lance/actions/runs/23450834811
  `main` at `244c721504c6ef0b4c2f9700a342509976898d6e`
- https://github.com/lance-format/lance/actions/runs/23460892697
   #6263
  

In those failures, different platforms / index variants failed:
- `linux-arm / case_1_ivf_flat` on `main`
- `linux-build / case_2_ivf_pq` on #6263

That points to an existing flaky test.

## Root Cause
The test compared the exact Top-K `_rowid` results between:
- a single-segment index build, and
- a distributed multi-segment index build

However, the query path used by the test is ANN (`ANNIVFPartition`)
under the default probing behavior. With partial probing, the candidate
set can differ slightly between single-segment and multi-segment
layouts, especially near the tail of Top-K. That makes exact `_rowid`
equality too strict for this test and causes intermittent failures.

## Fix
Make the test probe all IVF partitions before comparing Top-K row ids:
- add `.minimum_nprobes(TWO_FRAG_NUM_PARTITIONS)` to the test query

This keeps the existing strong assertion (`ids_single == ids_split`) but
removes the probing-related source of nondeterminism.

## Testing
Local verification:
- `export PROTOC=/opt/homebrew/opt/protobuf@3/bin/protoc`
- `cargo test -p lance
test_distributed_vector_build_commits_multiple_segments_and_preserves_query_results
-- --nocapture`

Observed result:
- `case_1_ivf_flat ... ok`
- `case_2_ivf_pq ... ok`
- `case_3_ivf_sq ... ok`

I also verified during debugging that with full probing enabled,
repeated runs of the previously flaky `ivf_flat` / `ivf_pq` cases became
stable.
Xuanwo added a commit that referenced this pull request Mar 27, 2026
This tightens the new multi-segment vector index path added in #6220.

It enforces disjoint fragment coverage when committing a segment set,
adds regression coverage that grouped segment coverage matches the union
of its source shard coverage, and verifies that remap only touches
segments covering affected fragments.

It also adds cleanup coverage for both replaced committed segments and
stale uncommitted `_indices/<uuid>` artifacts, and documents these
contracts in the distributed indexing guide.
wjones127 pushed a commit to wjones127/lance that referenced this pull request Mar 29, 2026
This refactors distributed vector indexing into a staged segment-build
pipeline and exposes the public APIs needed to integrate an external
distributed build workflow with Lance. It defines the storage-level
model for partial shards, staged planning, built segments, and segment
commit, and documents the current distributed indexing flow.

This PR builds on the segment commit API from
lance-format#6209. The main changes are
organized into five commits:

- [test: cover distributed vector segment
build](lance-format@a86274da2)
- [refactor: internalize distributed vector segment
build](lance-format@1e5f0e15b)
- [feat: add public vector segment builder
API](lance-format@691cecb9a)
- [feat: add Python vector segment builder
API](lance-format@a07ef6144)
- [docs: document distributed vector segment
build](lance-format@0a3f230d7)

Follow-up fixes:

- [fix: expose python
create_index_uncommitted](lance-format@c1d3b1666)
- [fix: format python
bindings](lance-format@b86f91c4d)
- [fix: format python segment builder
bindings](lance-format@bfc9e63a0)

Please review accordingly.

---

Guide:
https://github.com/lance-format/lance/blob/xuanwo/vector-staging-merge-internal/docs/src/guide/distributed_indexing.md
wjones127 pushed a commit to wjones127/lance that referenced this pull request Mar 29, 2026
)

## Summary
This fixes a flaky regression test added in lance-format#6220
(`b80fbb3231cf58dd50e5670f9c56d309999bbd73`).

The affected test is:
-
`test_distributed_vector_build_commits_multiple_segments_and_preserves_query_results`

Recent failures showed up in both of these runs:
- https://github.com/lance-format/lance/actions/runs/23450834811
  `main` at `244c721504c6ef0b4c2f9700a342509976898d6e`
- https://github.com/lance-format/lance/actions/runs/23460892697
   lance-format#6263
  

In those failures, different platforms / index variants failed:
- `linux-arm / case_1_ivf_flat` on `main`
- `linux-build / case_2_ivf_pq` on lance-format#6263

That points to an existing flaky test.

## Root Cause
The test compared the exact Top-K `_rowid` results between:
- a single-segment index build, and
- a distributed multi-segment index build

However, the query path used by the test is ANN (`ANNIVFPartition`)
under the default probing behavior. With partial probing, the candidate
set can differ slightly between single-segment and multi-segment
layouts, especially near the tail of Top-K. That makes exact `_rowid`
equality too strict for this test and causes intermittent failures.

## Fix
Make the test probe all IVF partitions before comparing Top-K row ids:
- add `.minimum_nprobes(TWO_FRAG_NUM_PARTITIONS)` to the test query

This keeps the existing strong assertion (`ids_single == ids_split`) but
removes the probing-related source of nondeterminism.

## Testing
Local verification:
- `export PROTOC=/opt/homebrew/opt/protobuf@3/bin/protoc`
- `cargo test -p lance
test_distributed_vector_build_commits_multiple_segments_and_preserves_query_results
-- --nocapture`

Observed result:
- `case_1_ivf_flat ... ok`
- `case_2_ivf_pq ... ok`
- `case_3_ivf_sq ... ok`

I also verified during debugging that with full probing enabled,
repeated runs of the previously flaky `ivf_flat` / `ivf_pq` cases became
stable.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants