feat: support stop-word gaps in phrase queries#6277
Conversation
PR ReviewP0: Remove
|
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Signed-off-by: BubbleCal <bubble-cal@outlook.com>
westonpace
left a comment
There was a problem hiding this comment.
Nice fix. I didn't realize this could be a query-path-only change, very cool.
This change enables phrase queries to match across stop-word gaps. Example: For `doc="love the format"` indexed with `remove_stop_words=True`, the index does not store the stop word the. With this change, users can still match the document with the phrase query `q="love the format"`. In this mode, all stop words are treated as equivalent placeholders for phrase matching, so `q="love a format"` will also match the same document. This makes queries that containing stop words 3x~10x faster in the cost of a lit bit accuracy --------- Signed-off-by: BubbleCal <bubble-cal@outlook.com>
This change enables phrase queries to match across stop-word gaps.
Example:
For
doc="love the format"indexed withremove_stop_words=True, the index does not store the stop word the.With this change, users can still match the document with the phrase query
q="love the format". In this mode, all stop words are treated as equivalent placeholders for phrase matching, soq="love a format"will also match the same document.This makes queries that containing stop words 3x~10x faster in the cost of a lit bit accuracy