Conversation
PR Review: feat: add index segment commit APIOverall this is a clean, well-tested PR. A few items worth discussing: P1: Truncation casting
|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
westonpace
left a comment
There was a problem hiding this comment.
Some suggestions but looks good overall
| index_details, | ||
| index_version, | ||
| created_at: Some(chrono::Utc::now()), | ||
| base_id: None, |
There was a problem hiding this comment.
Is there no way to specify a base id? Not that important for now but I think we'll eventually need it.
There was a problem hiding this comment.
IndexSegment exposed this API but no upper APIs to use it, we need to refactor the upper APIs first. Yes, we can treat this as a follow up.
The method was renamed in lance-format#6209 but the test call site in v2.rs was not updated during the merge. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This refactors distributed vector indexing into a staged segment-build pipeline and exposes the public APIs needed to integrate an external distributed build workflow with Lance. It defines the storage-level model for partial shards, staged planning, built segments, and segment commit, and documents the current distributed indexing flow. This PR builds on the segment commit API from #6209. The main changes are organized into five commits: - [test: cover distributed vector segment build](a86274da2) - [refactor: internalize distributed vector segment build](1e5f0e15b) - [feat: add public vector segment builder API](691cecb9a) - [feat: add Python vector segment builder API](a07ef6144) - [docs: document distributed vector segment build](0a3f230d7) Follow-up fixes: - [fix: expose python create_index_uncommitted](c1d3b1666) - [fix: format python bindings](b86f91c4d) - [fix: format python segment builder bindings](bfc9e63a0) Please review accordingly. --- Guide: https://github.com/lance-format/lance/blob/xuanwo/vector-staging-merge-internal/docs/src/guide/distributed_indexing.md
This refactors distributed vector indexing into a staged segment-build pipeline and exposes the public APIs needed to integrate an external distributed build workflow with Lance. It defines the storage-level model for partial shards, staged planning, built segments, and segment commit, and documents the current distributed indexing flow. This PR builds on the segment commit API from #6209. The main changes are organized into five commits: - [test: cover distributed vector segment build](a86274da2) - [refactor: internalize distributed vector segment build](1e5f0e15b) - [feat: add public vector segment builder API](691cecb9a) - [feat: add Python vector segment builder API](a07ef6144) - [docs: document distributed vector segment build](0a3f230d7) Follow-up fixes: - [fix: expose python create_index_uncommitted](c1d3b1666) - [fix: format python bindings](b86f91c4d) - [fix: format python segment builder bindings](bfc9e63a0) Please review accordingly. --- Guide: https://github.com/lance-format/lance/blob/xuanwo/vector-staging-merge-internal/docs/src/guide/distributed_indexing.md
This adds a first-class `IndexSegment` type and a `commit_existing_index_segments` API so a logical index can be committed from multiple physical segments. It is intended as a building block for the multi-segment vector index work without changing the high-level `create_index` flow. Part of lance-format#6180
The method was renamed in lance-format#6209 but the test call site in v2.rs was not updated during the merge. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This refactors distributed vector indexing into a staged segment-build pipeline and exposes the public APIs needed to integrate an external distributed build workflow with Lance. It defines the storage-level model for partial shards, staged planning, built segments, and segment commit, and documents the current distributed indexing flow. This PR builds on the segment commit API from lance-format#6209. The main changes are organized into five commits: - [test: cover distributed vector segment build](lance-format@a86274da2) - [refactor: internalize distributed vector segment build](lance-format@1e5f0e15b) - [feat: add public vector segment builder API](lance-format@691cecb9a) - [feat: add Python vector segment builder API](lance-format@a07ef6144) - [docs: document distributed vector segment build](lance-format@0a3f230d7) Follow-up fixes: - [fix: expose python create_index_uncommitted](lance-format@c1d3b1666) - [fix: format python bindings](lance-format@b86f91c4d) - [fix: format python segment builder bindings](lance-format@bfc9e63a0) Please review accordingly. --- Guide: https://github.com/lance-format/lance/blob/xuanwo/vector-staging-merge-internal/docs/src/guide/distributed_indexing.md
This adds a first-class
IndexSegmenttype and acommit_existing_index_segmentsAPI so a logical index can be committed from multiple physical segments. It is intended as a building block for the multi-segment vector index work without changing the high-levelcreate_indexflow.Part of #6180