Skip to content

Conversation

@lxy-9602
Copy link
Collaborator

@lxy-9602 lxy-9602 commented Jan 29, 2026

Purpose

Linked issue: #69

This PR adds support for pre-filter and limit parameters in full-text search, enabling more flexible and efficient querying.

  • Supports pre-filtering via row ID bitmap:
    Users can provide a RoaringBitmap64 as pre_filter to restrict the search space to specific rows (e.g., from other global index lookups). If not set, all rows are considered.

  • Supports optional limit for Full-text search retrieval:

    • When limit is set: Results are scored using similarity (e.g., tf-idf), and only the top limit results are returned, sorted by score (descending).
    • When limit = nullopt: All matching documents are returned without scoring or sorting — ideal for boolean match queries with no ranking needed.

Tests

LuceneInterfaceTest
LuceneGlobalIndexTest

API and Format

FullTextSearch

@lxy-9602 lxy-9602 changed the title feat: support filter & limit in full text search feat: support pre-filter & limit in full text search Jan 29, 2026
@lxy-9602 lxy-9602 force-pushed the fts-support-prefilter branch from 0061f65 to f6a9ba0 Compare January 30, 2026 01:04
@lucasfang lucasfang requested a review from Copilot January 30, 2026 01:23
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (1)

src/paimon/global_index/lucene/lucene_api_test.cpp:151

  • Variable name has a typo: 'resule_doc_id_vec' should be 'result_doc_id_vec'.
        std::vector<int32_t> resule_doc_id_vec;
        std::vector<std::wstring> result_doc_id_content_vec;
        for (auto score_doc : results->scoreDocs) {
            Lucene::DocumentPtr result_doc = searcher->doc(score_doc->doc);
            resule_doc_id_vec.push_back(score_doc->doc);
            result_doc_id_content_vec.push_back(result_doc->get(L"id"));
        }
        ASSERT_EQ(resule_doc_id_vec, expected_doc_id_vec);

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@lxy-9602 lxy-9602 force-pushed the fts-support-prefilter branch from 5f0f1e5 to e896ad1 Compare January 30, 2026 03:47
Copy link
Collaborator

@lucasfang lucasfang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@lucasfang lucasfang merged commit a6fe3b4 into alibaba:main Jan 30, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants