Skip to content

feat: support fuzzy query for FTS#3567

Closed
BubbleCal wants to merge 12 commits intolance-format:mainfrom
BubbleCal:fuzzy-query
Closed

feat: support fuzzy query for FTS#3567
BubbleCal wants to merge 12 commits intolance-format:mainfrom
BubbleCal:fuzzy-query

Conversation

@BubbleCal
Copy link
Copy Markdown
Contributor

@BubbleCal BubbleCal commented Mar 19, 2025

this introduces fst lib for implementing fuzzy query:

  • generally, fst is like an immutable Map<String, u64>, but supports kinds of string queries (e.g. fuzzy search, prefix-match, substring, not equal)
  • when building the FTS index, we stores the tokens in a HashMap because we require mutability
  • when loading the FTS for serving queries, we load the tokens into fst so that we can support fuzzy query, and probably more kinds of queries in the future

Another impacts:

  • fst uses less memory, especially there are many similar tokens
  • fst is slower than HashMap for getting the token id, but for FTS most time is spent on searching over posting lists so this doesn't make any visible impacts for query latency

@github-actions github-actions Bot added the enhancement New feature or request label Mar 19, 2025
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
@BubbleCal BubbleCal marked this pull request as ready for review March 19, 2025 11:16
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 19, 2025

Codecov Report

Attention: Patch coverage is 77.09924% with 30 lines in your changes missing coverage. Please review.

Project coverage is 78.67%. Comparing base (babb5ab) to head (a956ecb).

Files with missing lines Patch % Lines
rust/lance-index/src/scalar/inverted/index.rs 60.60% 23 Missing and 3 partials ⚠️
rust/lance-index/src/scalar.rs 71.42% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3567      +/-   ##
==========================================
+ Coverage   78.65%   78.67%   +0.01%     
==========================================
  Files         258      258              
  Lines       96782    96896     +114     
  Branches    96782    96896     +114     
==========================================
+ Hits        76126    76229     +103     
- Misses      17588    17598      +10     
- Partials     3068     3069       +1     
Flag Coverage Δ
unittests 78.67% <77.09%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
@BubbleCal
Copy link
Copy Markdown
Contributor Author

move to #3610

@BubbleCal BubbleCal closed this Mar 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants