Skip to content

Using FTS as a filter in vector search #4927

@wojiaodoubao

Description

@wojiaodoubao

A common use case is: user stores tags in a text column. Among all records that contain the specified tag(e.g. city=shanghai), he/she needs to search for the records nearest to the specified vector(e.g. embedding of traffic light image).

Currently, Scanner does not support setting both FTS query and vector query. I think we can support full text search as a filter in vector scan, for better performance and usability.

Since FTS is a filter of vector scan, the behavior is also affected by prefilter.

  1. If prefilter is true, we perform vector search based on the results of FTS.
  2. If prefilter is false, we perform vector search, then filter out the results that do not meet the FTS conditions.

In lancedb, we have supported hybrid search, which is a powerful feature that allows arbitrary sorting of query results through a Reranker. However, it requires executing two separate queries, even if we just want to perform a vector search based on the filtering results of FTS. I think in lance, we can handle this case better.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions