-
Notifications
You must be signed in to change notification settings - Fork 25
feat: support full-text search with Lucene++ #80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
25 commits
Select commit
Hold shift + click to select a range
50f6261
feat: support full text search with Lucene++
lxy-9602 3975ee6
fix2
lxy-9602 3a66242
fix2
lxy-9602 8ff8595
fix3
lxy-9602 c8429c7
fix4
lxy-9602 14ee811
fix
lxy-9602 8a24bcd
fix
lxy-9602 8da7e84
fix
lxy-9602 8059d56
fix
lxy-9602 21ca74f
fix0554
lxy-9602 afa0bef
fix0554
lxy-9602 5457513
fix
lxy-9602 8faf048
fix boost
lxy-9602 b2b64ba
fix
lxy-9602 4b70631
fix
lxy-9602 07a2f30
fix0430
lxy-9602 b9edb8e
fix
lxy-9602 a644113
fix 0532
lxy-9602 6a7ecc8
fix 0715
lxy-9602 89b2bab
fix
lxy-9602 6eab383
fix 2109
lxy-9602 8981f02
fix row id in lucene index from int64 to int32
lxy-9602 9198e39
fix review
lxy-9602 564ec56
fix
lxy-9602 88d5e24
Merge branch 'main' into full-text-seach
lxy-9602 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,74 @@ | ||
| /* | ||
| * Copyright 2026-present Alibaba Inc. | ||
| * | ||
| * Licensed under the Apache License, Version 2.0 (the "License"); | ||
| * you may not use this file except in compliance with the License. | ||
| * You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, software | ||
| * distributed under the License is distributed on an "AS IS" BASIS, | ||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ | ||
|
|
||
| #pragma once | ||
| #include <functional> | ||
| #include <map> | ||
| #include <memory> | ||
| #include <optional> | ||
| #include <string> | ||
| #include <vector> | ||
|
|
||
| #include "paimon/predicate/predicate.h" | ||
| #include "paimon/visibility.h" | ||
|
|
||
| namespace paimon { | ||
| /// A configuration structure for full-text search operations. | ||
| struct PAIMON_EXPORT FullTextSearch { | ||
| /// Enumeration of supported full-text search types. | ||
| enum class SearchType { | ||
| /// All terms in the query must be present (AND semantics). | ||
| MATCH_ALL = 1, | ||
| /// Any term in the query can match (OR semantics). | ||
| MATCH_ANY = 2, | ||
| /// Matches the exact sequence of words (with proximity). | ||
| PHRASE = 3, | ||
| /// Matches terms starting with the given string (e.g., "run*" → running, runner). | ||
| PREFIX = 4, | ||
| /// Supports wildcards * and ? (e.g., "ap*e", "app?e" -> "apple"). | ||
| WILDCARD = 5, | ||
| /// Default/fallback type for unrecognized or invalid queries. | ||
| UNKNOWN = 128 | ||
| }; | ||
|
|
||
| FullTextSearch(const std::string& _field_name, int32_t _limit, const std::string& _query, | ||
| const SearchType& _search_type) | ||
| : field_name(_field_name), limit(_limit), query(_query), search_type(_search_type) {} | ||
|
|
||
| /// Name of the field to search within (must be a full-text indexed field). | ||
| std::string field_name; | ||
| /// Maximum number of documents to return. Ordered by scores. | ||
| int32_t limit; | ||
| /// The query string to search for. The interpretation depends on search_type: | ||
| /// | ||
| /// - For MATCH_ALL/MATCH_ANY: keywords are split into terms using the **same analyzer as | ||
| /// indexing**. | ||
| /// Example: "Hello World" → terms ["hello", "world"] (after lowercasing and tokenization). | ||
| /// | ||
| /// - For PHRASE: matches the exact word sequence (with optional slop). Also be analyzed. | ||
| /// | ||
| /// - For PREFIX: matches terms starting with the given string (e.g., "run" → running, runner). | ||
| /// Only the prefix part is considered; analysis will not be applied. | ||
| /// | ||
| /// - For WILDCARD: supports wildcards * and ? (e.g., "ap*e", "app?e"). | ||
| /// Not passed through analyzer — matched directly against indexed terms. | ||
| /// | ||
| /// @note Analyzer consistency between indexing and querying is critical for correctness. | ||
| std::string query; | ||
| /// Type of search to perform. | ||
| SearchType search_type; | ||
| }; | ||
| } // namespace paimon | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.