feat: clarify logical indices and physical index segments#6270
Conversation
|
ACTION NEEDED The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. For details on the error please inspect the "PR Title Check" action. |
PR Review: Clarify logical indices and physical index segmentsClean PR that introduces a well-structured logical/physical split for index APIs. A few observations: Minor issue
None => self
.load_indices()
.await
.map(|indices| indices.as_ref().clone()),This clones the entire Looks good
🤖 Generated with Claude Code |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
wjones127
left a comment
There was a problem hiding this comment.
I agree with the rename of segments on IndexDescription. I'm not as sure of the value of the convenience method.
|
|
||
| /// Returns the physical index segments that make up this logical index. | ||
| /// | ||
| /// This is an alias for [`Self::metadata`] with a less ambiguous name. |
There was a problem hiding this comment.
praise: +1 I like explicitly calling it segments.
| /// When `name` is provided, only segments belonging to the named logical | ||
| /// index are returned. Otherwise, all index segments in the current dataset | ||
| /// version are returned. | ||
| async fn describe_index_segments(&self, name: Option<&str>) -> Result<Vec<IndexMetadata>> { |
There was a problem hiding this comment.
question(blocking): Users can already call ds.describe_indices(IndexCriteria::default().for_name(name)).await?.first().map(|idx| idx.segments) to get this. I worry adding a new method clutters our API. Do you think it's worth adding this method? Is this called commonly enough where it's worth making a simplified API?
I say this in part because I think segments are a low-level concept that most users won't care about.
There was a problem hiding this comment.
I think it's a good point. Let's remove it from the public API.
| def describe_index_segments( | ||
| self, index_name: Optional[str] = None | ||
| ) -> List[IndexSegmentDescription]: |
There was a problem hiding this comment.
if we removed in Rust, should we also remove it in Python and Java? Or is there a reason to keep it in these bindings?
There was a problem hiding this comment.
ooops, I overlook them
# Conflicts: # rust/lance-index/src/traits.rs
This change makes the logical-index and physical-segment split explicit in the user-facing index APIs without breaking existing behavior.
describe_indicesremains the logical view,describe_index_segmentsbecomes the explicit physical-segment view, and index statistics now exposenum_segments/segmentsalongside the legacy fields for compatibility.The Rust, Python, and Java bindings now use the same model so segment-aware callers do not need to infer semantics from raw manifest metadata. I validated the Rust path with
cargo test -p lance test_optimize_delta_indices -- --nocaptureand the Java path with./mvnw -q -Dtest=DatasetTest#testDescribeIndicesByName test.