feat(python): expose search_filter in scanner#5506
feat(python): expose search_filter in scanner#5506BubbleCal merged 1 commit intolance-format:mainfrom
Conversation
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
2f3a37d to
5d27211
Compare
|
hi @BubbleCal , could you help review this when you have time, thanks very much~ |
BubbleCal
left a comment
There was a problem hiding this comment.
Great work!
just need to add some doc strings then feel free to merge
| def filter( | ||
| self, | ||
| filter: Union[ | ||
| str, pa.compute.Expression, FullTextQuery, VectorSearchQuery, dict |
There was a problem hiding this comment.
let's add some doc strings for this param, esp for dict case
westonpace
left a comment
There was a problem hiding this comment.
Can we call this a "search filter" instead of a "query filter"? I think of queries as both "filtered scans" and "searches". This filter both:
- Only applies to searches (can't use this on a filtered scan)
- Is a search itself (e.g. a VectorSearchQuery or FullTextQuery)
2d0dc31 to
12ed9b0
Compare
|
Hi @westonpace , pr has been updated, please review when you have time, thanks very much! |
| filter=VectorSearchQuery( | ||
| "vector", | ||
| np.array([12, 17, 300, 10], dtype=np.float32), | ||
| 5, | ||
| 20, | ||
| True, | ||
| ) |
There was a problem hiding this comment.
I still don't actually know what happens in this case. Is the full text search applied after the vector search? Basically just re-ranking the vector search results?
Or is there some equivalent to "at least one token matches" that is being used here?
There was a problem hiding this comment.
Is the full text search applied after the vector search? Basically just re-ranking the vector search results?
Yes, in this case the vector search is used as a filter for full text search. But it doesn't mean the full text search is always applied after the vector search. The Scanner::prefilter determines whether the vector search filter is used before/after full text search.
If prefiter=false, we first do a full text search and get rows with score > 0, then we do refine filter using flat knn to get the top 5 nearest rows.
If prefilter=true, we first do a vector search and get 5 rows, then do a flat full text search based on the 5 rows. The full text search will drop the unmatched rows and re-rank the results by score.
f0d0626 to
6e81782
Compare
|
Hi @westonpace, @BubbleCal, shall we move forward? |
7e6fb80 to
448139c
Compare
448139c to
6a7ea2d
Compare
|
LGTM, thanks for adding this! |
Adding python api to support using fts as a filter for vector search, or using vector_query as a filter for fts search. Related to #4928.