feat!: support sampling selected fragments by Xuanwo · Pull Request #6294 · lance-format/lance

Xuanwo · 2026-03-25T17:10:07Z

This extends Dataset::sample to accept optional fragment IDs so grouped training jobs can sample from selected fragments without reimplementing deletion-aware row offset mapping. It rejects empty fragment selections and unknown fragment IDs while preserving the existing behavior for whole-dataset sampling.

github-actions · 2026-03-25T17:12:19Z

PR Review

Clean, well-structured change. The refactoring of row_offsets_to_row_addresses to accept &[FileFragment] instead of &Dataset is a nice generalization that enables the subset sampling without duplicating logic.

One minor observation (not blocking)

Double row counting for fragment-subset path: When fragment_ids is Some, count_rows is called for each selected fragment twice — once in the buffer_unordered(16) fold to compute num_rows, and again sequentially inside row_offsets_to_row_addresses. If count_rows isn't cached, this doubles the I/O for deletion vector reads. You could pass the precomputed total into row_offsets_to_row_addresses or pre-collect the per-fragment counts, but it's likely negligible for typical workloads.

Tests

Good coverage: subset sampling with deletions, empty fragment list rejection, unknown fragment ID rejection. The duplicate fragment ID in test_sample_with_fragment_ids ([0, 0, 2]) is a nice touch for exercising the dedup via BTreeSet.

LGTM.

codecov · 2026-03-25T17:56:40Z

Codecov Report

❌ Patch coverage is 87.93103% with 7 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
rust/lance/src/dataset.rs	90.38%	0 Missing and 5 partials ⚠️
rust/lance/src/dataset/take.rs	66.66%	0 Missing and 1 partial ⚠️
rust/lance/src/index/vector/utils.rs	0.00%	0 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

wjones127

This looks good. Have a few minor suggestions on tests, but those are optional.

…t-ids

This PR builds on #6294 and exposes the remaining pieces needed to construct non-shared centroid vector index builds. It adds fragment-scoped IVF/PQ training in Rust and exports the same training flow to Python, so users can train per-segment artifacts and feed them into the existing distributed build path.

feat: support sampling selected fragments

641dd31

github-actions Bot added the enhancement New feature or request label Mar 25, 2026

Merge branch 'main' into xuanwo/sample-fragment-ids

06563d7

Xuanwo force-pushed the xuanwo/sample-fragment-ids branch from 16ffe50 to 06563d7 Compare March 25, 2026 17:26

Xuanwo changed the title ~~feat: support sampling selected fragments~~ feat!: support sampling selected fragments Mar 25, 2026

github-actions Bot added the breaking-change label Mar 25, 2026

github-actions Bot added the python label Mar 25, 2026

Xuanwo force-pushed the xuanwo/sample-fragment-ids branch from c5aed6c to 06563d7 Compare March 25, 2026 18:14

Xuanwo mentioned this pull request Mar 25, 2026

feat: support non-shared centroid vector index builds #6296

Merged

wjones127 approved these changes Mar 25, 2026

View reviewed changes

Comment thread rust/lance/src/dataset/tests/dataset_io.rs

Comment thread rust/lance/src/dataset/tests/dataset_io.rs

Xuanwo added 2 commits March 26, 2026 18:00

test: simplify sample fragment validation tests

a663700

Merge remote-tracking branch 'origin/main' into xuanwo/sample-fragmen…

09610ac

…t-ids

wjones127 reviewed Mar 26, 2026

View reviewed changes

Comment thread rust/lance/src/dataset/tests/dataset_io.rs Outdated

test: restore storage-version sample coverage

306cb3d

Xuanwo merged commit f0aa55f into main Mar 26, 2026
28 checks passed

Xuanwo deleted the xuanwo/sample-fragment-ids branch March 26, 2026 20:33

wjones127 pushed a commit to wjones127/lance that referenced this pull request Mar 29, 2026

feat!: support sampling selected fragments (lance-format#6294)

52dab0d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat!: support sampling selected fragments#6294

feat!: support sampling selected fragments#6294
Xuanwo merged 5 commits intomainfrom
xuanwo/sample-fragment-ids

Xuanwo commented Mar 25, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Mar 25, 2026

Uh oh!

codecov Bot commented Mar 25, 2026 •

edited

Loading

Uh oh!

wjones127 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Xuanwo commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Mar 25, 2026

PR Review

One minor observation (not blocking)

Tests

Uh oh!

codecov Bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

wjones127 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Xuanwo commented Mar 25, 2026 •

edited

Loading

codecov Bot commented Mar 25, 2026 •

edited

Loading