Skip to content

feat: clarify logical indices and physical index segments#6270

Merged
Xuanwo merged 7 commits intomainfrom
xuanwo/index-segment-api
Mar 27, 2026
Merged

feat: clarify logical indices and physical index segments#6270
Xuanwo merged 7 commits intomainfrom
xuanwo/index-segment-api

Conversation

@Xuanwo
Copy link
Copy Markdown
Collaborator

@Xuanwo Xuanwo commented Mar 24, 2026

This change makes the logical-index and physical-segment split explicit in the user-facing index APIs without breaking existing behavior. describe_indices remains the logical view, describe_index_segments becomes the explicit physical-segment view, and index statistics now expose num_segments / segments alongside the legacy fields for compatibility.

The Rust, Python, and Java bindings now use the same model so segment-aware callers do not need to infer semantics from raw manifest metadata. I validated the Rust path with cargo test -p lance test_optimize_delta_indices -- --nocapture and the Java path with ./mvnw -q -Dtest=DatasetTest#testDescribeIndicesByName test.

@github-actions
Copy link
Copy Markdown
Contributor

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

@Xuanwo Xuanwo changed the title Clarify logical indices and physical index segments feat: clarify logical indices and physical index segments Mar 24, 2026
@github-actions github-actions Bot added the enhancement New feature or request label Mar 24, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR Review: Clarify logical indices and physical index segments

Clean PR that introduces a well-structured logical/physical split for index APIs. A few observations:

Minor issue

describe_index_segments clones all metadata when name is None (rust/lance-index/src/traits.rs:261):

None => self
    .load_indices()
    .await
    .map(|indices| indices.as_ref().clone()),

This clones the entire Vec<IndexMetadata> out of the Arc. Since callers already hold the Arc from load_indices(), consider returning Arc<Vec<IndexMetadata>> or accepting the clone cost explicitly. Not a hot path, but worth noting since index metadata can grow with many segments.

Looks good

  • Alias approach (segments/num_segments alongside indices/num_indices) is a clean way to introduce better naming without breaking existing consumers.
  • Extracting PyIndexSegmentDescription::from_metadata removes duplication nicely.
  • Test coverage is solid across all three language bindings.
  • The json! macro .clone() on indices_stats is necessary since the value is consumed twice — that's fine.

🤖 Generated with Claude Code

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 24, 2026

Codecov Report

❌ Patch coverage is 82.35294% with 3 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance-index/src/traits.rs 0.00% 3 Missing ⚠️

📢 Thoughts on this report? Let us know!

@wjones127 wjones127 self-assigned this Mar 24, 2026
Copy link
Copy Markdown
Contributor

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the rename of segments on IndexDescription. I'm not as sure of the value of the convenience method.


/// Returns the physical index segments that make up this logical index.
///
/// This is an alias for [`Self::metadata`] with a less ambiguous name.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: +1 I like explicitly calling it segments.

Comment thread rust/lance-index/src/traits.rs Outdated
/// When `name` is provided, only segments belonging to the named logical
/// index are returned. Otherwise, all index segments in the current dataset
/// version are returned.
async fn describe_index_segments(&self, name: Option<&str>) -> Result<Vec<IndexMetadata>> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question(blocking): Users can already call ds.describe_indices(IndexCriteria::default().for_name(name)).await?.first().map(|idx| idx.segments) to get this. I worry adding a new method clutters our API. Do you think it's worth adding this method? Is this called commonly enough where it's worth making a simplified API?

I say this in part because I think segments are a low-level concept that most users won't care about.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a good point. Let's remove it from the public API.

@Xuanwo Xuanwo requested a review from wjones127 March 25, 2026 18:40
Comment thread python/python/lance/dataset.py Outdated
Comment on lines +665 to +667
def describe_index_segments(
self, index_name: Optional[str] = None
) -> List[IndexSegmentDescription]:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we removed in Rust, should we also remove it in Python and Java? Or is there a reason to keep it in these bindings?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooops, I overlook them

@Xuanwo Xuanwo merged commit 8b8e36a into main Mar 27, 2026
28 checks passed
@Xuanwo Xuanwo deleted the xuanwo/index-segment-api branch March 27, 2026 08:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request java python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants