-
Notifications
You must be signed in to change notification settings - Fork 697
MB-69881: Re-architect vector search #2270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
65b931b
f6bf3af
ec24d99
27f2d9f
da11922
a76bdab
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -22,6 +22,7 @@ import ( | |||||||||||||||
| "encoding/json" | ||||||||||||||||
| "fmt" | ||||||||||||||||
|
|
||||||||||||||||
| "github.com/bits-and-blooms/bitset" | ||||||||||||||||
| index "github.com/blevesearch/bleve_index_api" | ||||||||||||||||
| segment_api "github.com/blevesearch/scorch_segment_api/v2" | ||||||||||||||||
| ) | ||||||||||||||||
|
|
@@ -48,14 +49,76 @@ func (is *IndexSnapshot) VectorReader(ctx context.Context, vector []float32, | |||||||||||||||
| // eligibleDocumentSelector is used to filter out documents that are eligible for | ||||||||||||||||
| // the KNN search from a pre-filter query. | ||||||||||||||||
| type eligibleDocumentSelector struct { | ||||||||||||||||
| // segment ID -> segment local doc nums | ||||||||||||||||
| eligibleDocNums map[int][]uint64 | ||||||||||||||||
| // segment ID -> segment local doc nums in a bitset | ||||||||||||||||
| eligibleDocNums []*bitset.BitSet | ||||||||||||||||
| is *IndexSnapshot | ||||||||||||||||
| } | ||||||||||||||||
|
|
||||||||||||||||
| // SegmentEligibleDocs returns the list of eligible local doc numbers for the given segment. | ||||||||||||||||
| func (eds *eligibleDocumentSelector) SegmentEligibleDocs(segmentID int) []uint64 { | ||||||||||||||||
| return eds.eligibleDocNums[segmentID] | ||||||||||||||||
| // eligibleDocumentList represents the list of eligible documents within a segment. | ||||||||||||||||
| type eligibleDocumentList struct { | ||||||||||||||||
| bs *bitset.BitSet | ||||||||||||||||
| } | ||||||||||||||||
|
|
||||||||||||||||
| // Iterator returns an iterator for the eligible document IDs. | ||||||||||||||||
| func (edl *eligibleDocumentList) Iterator() index.EligibleDocumentIterator { | ||||||||||||||||
| if edl.bs == nil { | ||||||||||||||||
| // no eligible documents | ||||||||||||||||
| return emptyEligibleIterator | ||||||||||||||||
| } | ||||||||||||||||
| // return the iterator | ||||||||||||||||
| return &eligibleDocumentIterator{ | ||||||||||||||||
| bs: edl.bs, | ||||||||||||||||
| max: uint(edl.bs.Len()), | ||||||||||||||||
| } | ||||||||||||||||
| } | ||||||||||||||||
|
|
||||||||||||||||
| // Count returns the number of eligible document IDs. | ||||||||||||||||
| func (edl *eligibleDocumentList) Count() int { | ||||||||||||||||
| if edl.bs == nil { | ||||||||||||||||
| return 0 | ||||||||||||||||
| } | ||||||||||||||||
| return int(edl.bs.Count()) | ||||||||||||||||
| } | ||||||||||||||||
|
|
||||||||||||||||
| // emptyEligibleDocumentList is a reusable empty eligible document list. | ||||||||||||||||
|
||||||||||||||||
| // emptyEligibleDocumentList is a reusable empty eligible document list. | |
| // emptyEligibleDocumentList is a reusable empty eligible document list. | |
| // It is intentionally defined as a shared singleton for performance reasons. | |
| // The underlying eligibleDocumentList for this variable is immutable | |
| // (its bitset is always nil), so it is safe to reuse this instance across | |
| // goroutines and calls to Iterator(). If eligibleDocumentList gains mutable | |
| // state in the future, this assumption must be revisited. |
Copilot
AI
Dec 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The boundary check next >= it.max may be redundant. The found boolean returned by NextSet should be sufficient to determine if a valid set bit was found. The additional check next >= it.max appears to be a defensive check, but if NextSet returns found=true, the returned index should always be within valid bounds (0 to Len()-1). Consider whether this additional check is necessary, or document why it's included as a defensive measure.
| if next >= it.max || !found { | |
| if !found { |
Copilot
AI
Dec 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment says "emptyIterator" but the variable name is "emptyEligibleIterator". Update the comment to match the actual variable name for consistency.
| // emptyIterator is a reusable empty eligible document iterator. | |
| // emptyEligibleIterator is a reusable empty eligible document iterator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grammar error in the comment: "postingsIterators is maintain" should be "postingsIterators maintains" or "the postingsIterators maintain".