fix: respect fragment restrictions in vector and FTS searches when requested fragments by yingjianwu98 · Pull Request #5924 · lance-format/lance

yingjianwu98 · 2026-02-10T02:03:34Z

Fixes bug where vector and FTS searches ignore with_fragments() filter when querying unindexed fragments, which will return results from indexed fragments that were not requested.

Note, this does not fix the issues where with_fragments contains both indexed_fragement and non_indexed_fragment for FTS and vector search, and I have a separate follow up PR to fix the behavior.

This PR combines my previous effort for fixing the issue:
#5081
#5080

…cases refactor refactor

github-actions · 2026-02-10T02:03:49Z

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

codecov · 2026-02-10T02:55:18Z

Codecov Report

❌ Patch coverage is 97.31544% with 4 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
rust/lance/src/dataset/scanner.rs	97.31%	0 Missing and 4 partials ⚠️

📢 Thoughts on this report? Let us know!

wjones127

The implementation seems reasonable, but I would be more convinced if you improved the unit test.

wjones127 · 2026-02-12T00:52:36Z

+        let batches: Vec<_> = scanner
+            .try_into_stream()
+            .await
+            .unwrap()
+            .try_collect::<Vec<_>>()
+            .await
+            .unwrap();


For tests, I'd recommend just calling .try_into_batch(). That makes things simpler.

Suggested change

let batches: Vec<_> = scanner

.try_into_stream()

.await

.unwrap()

.try_collect::<Vec<_>>()

.await

.unwrap();

let batches = scanner

.try_into_batch()

.await

.unwrap();

wjones127 · 2026-02-12T00:56:13Z

+        // Now we have 3 fragments:
+        // Fragment 0: i=0..200 (indexed)
+        // Fragment 1: i=200..400 (indexed)
+        // Fragment 2: i=400..410 (unindexed)


I think this test would be more convincing if:

Running the query without the fragment filter produced rows in all fragments

There was another unindexed fragment that you were excluding

You had another test case where you were filtering for a indexed fragment.

Could you modify the tests to cover all three of those?

yingjianwu98 · 2026-02-12T16:32:21Z

Thanks @wjones127 !

I have addressed your comments.

I am working on another PR that fix indexed_fragments filtering but let me know if you think I should put them together.

wjones127

Thanks for adding those tests. I think this looks better, but would like to give you the opportunity to make the tests a bit shorter.

wjones127 · 2026-02-12T18:55:49Z

+        let batch = scanner.try_into_batch().await.unwrap();
+        let i_col = batch.column_by_name("i").unwrap();
+        let i_array = i_col.as_any().downcast_ref::<Int32Array>().unwrap();
+
+        // Should only get results from fragment 2 (i=400..410)
+        let mut has_results = false;
+        for idx in 0..i_array.len() {
+            has_results = true;
+            let val = i_array.value(idx);
+            assert!(
+                (400..410).contains(&val),
+                "Expected only values from fragment 2 (i=400..410), but got i={}",
+                val
+            );
+        }
+        assert!(has_results, "Expected some results from fragment 2");


You could simplify these tests to something like this:

Suggested change

let batch = scanner.try_into_batch().await.unwrap();

let i_col = batch.column_by_name("i").unwrap();

let i_array = i_col.as_any().downcast_ref::<Int32Array>().unwrap();

// Should only get results from fragment 2 (i=400..410)

let mut has_results = false;

for idx in 0..i_array.len() {

has_results = true;

let val = i_array.value(idx);

assert!(

(400..410).contains(&val),

"Expected only values from fragment 2 (i=400..410), but got i={}",

val

);

}

assert!(has_results, "Expected some results from fragment 2");

let batch = scanner.try_into_batch().await.unwrap();

assert!(batch.num_rows() > 0, "Expected some results from fragment 2");

batch["i"].as_primitive<Int32Type>()

.iter()

.for_each(|val| {

assert!(

(400..410).contains(&val),

"Expected only values from fragment 2 (i=400..410), but got i={}",

val

);

});

wjones127 · 2026-02-12T19:00:54Z

+        // Test 2: Query only one unindexed fragment (fragment 2), excluding fragment 3
+        let fragment_2 = vec![fragments[2].clone()];
+
+        let mut scanner = test_ds.dataset.scan();
+        scanner
+            .full_text_search(FullTextSearchQuery::new("s-405".into()))
+            .unwrap()
+            .with_fragments(fragment_2);
+
+        let batch = scanner.try_into_batch().await.unwrap();
+        let i_col = batch.column_by_name("i").unwrap();
+        let i_array = i_col.as_any().downcast_ref::<Int32Array>().unwrap();
+
+        // Should only get results from fragment 2 (i=400..410)
+        let mut has_results = false;
+        for idx in 0..i_array.len() {
+            has_results = true;
+            let val = i_array.value(idx);
+            assert!(
+                (400..410).contains(&val),
+                "Expected only values from fragment 2 (i=400..410), but got i={}",
+                val
+            );
+        }
+        assert!(has_results, "Expected some results from fragment 2");


I like the number of test better, but does seem to be verbose. Could you refactor these into a common test function? That would make it a lot shorter.

…5953) If client specify .with_fragments, vector and FTS searches on indexed fragments should respect the target fragments. Previous PR to fix the unindexed fragments path: #5924 Co-authored-by: stevie9868 <yingjianwu2@email.com> Co-authored-by: Will Jones <willjones127@gmail.com>

respect fragment restrictions in vector and FTS searches for unindex …

e7e1885

…cases refactor refactor

yingjianwu98 changed the title ~~Respect fragment restrictions in vector and FTS searches when requested fragments~~ fix: Respect fragment restrictions in vector and FTS searches when requested fragments Feb 10, 2026

github-actions Bot added the bug Something isn't working label Feb 10, 2026

wjones127 requested changes Feb 12, 2026

View reviewed changes

wjones127 changed the title ~~fix: Respect fragment restrictions in vector and FTS searches when requested fragments~~ fix: respect fragment restrictions in vector and FTS searches when requested fragments Feb 12, 2026

wjones127 self-assigned this Feb 12, 2026

This was referenced Feb 12, 2026

fix: fts search should respect target fragment for scan #5081

Closed

fix: vector search should respect target fragment #5080

Closed

address comments

f0c3e81

fix: address clippy warnings - use is_some_and instead of map_or

1f5e3b8

wjones127 reviewed Feb 12, 2026

View reviewed changes

address comments

346749f

yingjianwu98 requested a review from wjones127 February 13, 2026 02:02

yingjianwu98 mentioned this pull request Feb 14, 2026

fix: respect requested indexed fragment in vector and FTS searches #5953

Merged

wjones127 approved these changes Feb 17, 2026

View reviewed changes

wjones127 merged commit 79b700f into lance-format:main Feb 17, 2026
29 checks passed

everySympathy mentioned this pull request Mar 11, 2026

feat(lance): vector search support approximate Top‑N retrieval for large N Eventual-Inc/Daft#6379

Closed

andrea-reale mentioned this pull request Mar 30, 2026

emilk/fix write starvation rerun-io/lance#12

Closed

shmilygkd mentioned this pull request Apr 14, 2026

perf: use dataset-level scan for indexed vector search to avoid per-fragment redundancy lance-format/lance-spark#432

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: respect fragment restrictions in vector and FTS searches when requested fragments#5924

fix: respect fragment restrictions in vector and FTS searches when requested fragments#5924
wjones127 merged 4 commits intolance-format:mainfrom
yingjianwu98:yingjianw/fix_unindex_search

yingjianwu98 commented Feb 10, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Feb 10, 2026

Uh oh!

codecov Bot commented Feb 10, 2026 •

edited

Loading

Uh oh!

wjones127 left a comment

Uh oh!

wjones127 Feb 12, 2026

Uh oh!

wjones127 Feb 12, 2026

Uh oh!

yingjianwu98 commented Feb 12, 2026

Uh oh!

wjones127 left a comment

Uh oh!

wjones127 Feb 12, 2026

Uh oh!

wjones127 Feb 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yingjianwu98 commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Feb 10, 2026

Uh oh!

codecov Bot commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

wjones127 left a comment

Choose a reason for hiding this comment

Uh oh!

wjones127 Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

wjones127 Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

yingjianwu98 commented Feb 12, 2026

Uh oh!

wjones127 left a comment

Choose a reason for hiding this comment

Uh oh!

wjones127 Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

wjones127 Feb 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yingjianwu98 commented Feb 10, 2026 •

edited

Loading

codecov Bot commented Feb 10, 2026 •

edited

Loading