-
Notifications
You must be signed in to change notification settings - Fork 25
feat: support pre-filter & limit in full text search #97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
0061f65 to
f6a9ba0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (1)
src/paimon/global_index/lucene/lucene_api_test.cpp:151
- Variable name has a typo: 'resule_doc_id_vec' should be 'result_doc_id_vec'.
std::vector<int32_t> resule_doc_id_vec;
std::vector<std::wstring> result_doc_id_content_vec;
for (auto score_doc : results->scoreDocs) {
Lucene::DocumentPtr result_doc = searcher->doc(score_doc->doc);
resule_doc_id_vec.push_back(score_doc->doc);
result_doc_id_content_vec.push_back(result_doc->get(L"id"));
}
ASSERT_EQ(resule_doc_id_vec, expected_doc_id_vec);
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
5f0f1e5 to
e896ad1
Compare
lucasfang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Purpose
Linked issue: #69
This PR adds support for
pre-filterandlimitparameters in full-text search, enabling more flexible and efficient querying.Supports pre-filtering via row ID bitmap:
Users can provide a
RoaringBitmap64aspre_filterto restrict the search space to specific rows (e.g., from other global index lookups). If not set, all rows are considered.Supports optional
limitfor Full-text search retrieval:limitis set: Results are scored using similarity (e.g., tf-idf), and only the toplimitresults are returned, sorted by score (descending).limit = nullopt: All matching documents are returned without scoring or sorting — ideal for boolean match queries with no ranking needed.Tests
LuceneInterfaceTest
LuceneGlobalIndexTest
API and Format
FullTextSearch