Skip to content

fix: respect fragment restrictions in vector and FTS searches when requested fragments#5924

Merged
wjones127 merged 4 commits intolance-format:mainfrom
yingjianwu98:yingjianw/fix_unindex_search
Feb 17, 2026
Merged

fix: respect fragment restrictions in vector and FTS searches when requested fragments#5924
wjones127 merged 4 commits intolance-format:mainfrom
yingjianwu98:yingjianw/fix_unindex_search

Conversation

@yingjianwu98
Copy link
Copy Markdown
Contributor

@yingjianwu98 yingjianwu98 commented Feb 10, 2026

Fixes bug where vector and FTS searches ignore with_fragments() filter when querying unindexed fragments, which will return results from indexed fragments that were not requested.

Note, this does not fix the issues where with_fragments contains both indexed_fragement and non_indexed_fragment for FTS and vector search, and I have a separate follow up PR to fix the behavior.

This PR combines my previous effort for fixing the issue:
#5081
#5080

@github-actions
Copy link
Copy Markdown
Contributor

ACTION NEEDED
Lance follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

For details on the error please inspect the "PR Title Check" action.

@yingjianwu98 yingjianwu98 changed the title Respect fragment restrictions in vector and FTS searches when requested fragments fix: Respect fragment restrictions in vector and FTS searches when requested fragments Feb 10, 2026
@github-actions github-actions Bot added the bug Something isn't working label Feb 10, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 10, 2026

Codecov Report

❌ Patch coverage is 97.31544% with 4 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance/src/dataset/scanner.rs 97.31% 0 Missing and 4 partials ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown
Contributor

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation seems reasonable, but I would be more convinced if you improved the unit test.

Comment thread rust/lance/src/dataset/scanner.rs Outdated
Comment on lines +9205 to +9211
let batches: Vec<_> = scanner
.try_into_stream()
.await
.unwrap()
.try_collect::<Vec<_>>()
.await
.unwrap();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For tests, I'd recommend just calling .try_into_batch(). That makes things simpler.

Suggested change
let batches: Vec<_> = scanner
.try_into_stream()
.await
.unwrap()
.try_collect::<Vec<_>>()
.await
.unwrap();
let batches = scanner
.try_into_batch()
.await
.unwrap();

Comment thread rust/lance/src/dataset/scanner.rs Outdated
Comment on lines +9188 to +9191
// Now we have 3 fragments:
// Fragment 0: i=0..200 (indexed)
// Fragment 1: i=200..400 (indexed)
// Fragment 2: i=400..410 (unindexed)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this test would be more convincing if:

  1. Running the query without the fragment filter produced rows in all fragments
  2. There was another unindexed fragment that you were excluding
  3. You had another test case where you were filtering for a indexed fragment.

Could you modify the tests to cover all three of those?

@wjones127 wjones127 changed the title fix: Respect fragment restrictions in vector and FTS searches when requested fragments fix: respect fragment restrictions in vector and FTS searches when requested fragments Feb 12, 2026
@wjones127 wjones127 self-assigned this Feb 12, 2026
@yingjianwu98
Copy link
Copy Markdown
Contributor Author

Thanks @wjones127 !

I have addressed your comments.

I am working on another PR that fix indexed_fragments filtering but let me know if you think I should put them together.

Copy link
Copy Markdown
Contributor

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding those tests. I think this looks better, but would like to give you the opportunity to make the tests a bit shorter.

Comment thread rust/lance/src/dataset/scanner.rs Outdated
Comment on lines +9183 to +9198
let batch = scanner.try_into_batch().await.unwrap();
let i_col = batch.column_by_name("i").unwrap();
let i_array = i_col.as_any().downcast_ref::<Int32Array>().unwrap();

// Should only get results from fragment 2 (i=400..410)
let mut has_results = false;
for idx in 0..i_array.len() {
has_results = true;
let val = i_array.value(idx);
assert!(
(400..410).contains(&val),
"Expected only values from fragment 2 (i=400..410), but got i={}",
val
);
}
assert!(has_results, "Expected some results from fragment 2");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could simplify these tests to something like this:

Suggested change
let batch = scanner.try_into_batch().await.unwrap();
let i_col = batch.column_by_name("i").unwrap();
let i_array = i_col.as_any().downcast_ref::<Int32Array>().unwrap();
// Should only get results from fragment 2 (i=400..410)
let mut has_results = false;
for idx in 0..i_array.len() {
has_results = true;
let val = i_array.value(idx);
assert!(
(400..410).contains(&val),
"Expected only values from fragment 2 (i=400..410), but got i={}",
val
);
}
assert!(has_results, "Expected some results from fragment 2");
let batch = scanner.try_into_batch().await.unwrap();
assert!(batch.num_rows() > 0, "Expected some results from fragment 2");
batch["i"].as_primitive<Int32Type>()
.iter()
.for_each(|val| {
assert!(
(400..410).contains(&val),
"Expected only values from fragment 2 (i=400..410), but got i={}",
val
);
});

Comment thread rust/lance/src/dataset/scanner.rs Outdated
Comment on lines +9309 to +9333
// Test 2: Query only one unindexed fragment (fragment 2), excluding fragment 3
let fragment_2 = vec![fragments[2].clone()];

let mut scanner = test_ds.dataset.scan();
scanner
.full_text_search(FullTextSearchQuery::new("s-405".into()))
.unwrap()
.with_fragments(fragment_2);

let batch = scanner.try_into_batch().await.unwrap();
let i_col = batch.column_by_name("i").unwrap();
let i_array = i_col.as_any().downcast_ref::<Int32Array>().unwrap();

// Should only get results from fragment 2 (i=400..410)
let mut has_results = false;
for idx in 0..i_array.len() {
has_results = true;
let val = i_array.value(idx);
assert!(
(400..410).contains(&val),
"Expected only values from fragment 2 (i=400..410), but got i={}",
val
);
}
assert!(has_results, "Expected some results from fragment 2");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the number of test better, but does seem to be verbose. Could you refactor these into a common test function? That would make it a lot shorter.

@wjones127 wjones127 merged commit 79b700f into lance-format:main Feb 17, 2026
29 checks passed
wjones127 added a commit that referenced this pull request Feb 23, 2026
…5953)

If client specify .with_fragments, vector and FTS searches on indexed
fragments should respect the target fragments.

Previous PR to fix the unindexed fragments path:
#5924

Co-authored-by: stevie9868 <yingjianwu2@email.com>
Co-authored-by: Will Jones <willjones127@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants