MB-69881: Re-architect vector search #2270

CascadingRadium · 2025-12-28T05:18:07Z

Use a bitset to track eligible documents instead of a slice of N uint64s, reducing memory usage from 8N bytes to N/8 bytes per segment (up to 64× reduction) and improving cache locality.
Pass an iterator over eligible documents that iterates the bitset directly, allowing direct translation into a bitset of eligible vector IDs in the storage layer and eliminating the need for a separate slice intermediary.
Fix garbage creation in the UnadornedPostingsIterator, which previously allocated a temporary struct per Next() call to wrap a doc number and satisfy the Postings interface; the iterator now returns a single reusable struct (one-time allocation) consistent with the working of the PostingsIterator in the storage-layer.
Avoid unnecessary BytesRead statistics computation when executing searches in no-scoring mode, removing redundant work as a micro-optimization.

Copilot

Pull request overview

This PR re-architects vector search to improve memory efficiency and reduce garbage collection pressure. The changes replace slice-based eligible document tracking with bitsets, achieving up to 64× memory reduction per segment, and optimize the iterator pattern to eliminate per-call allocations in the unadorned postings iterator.

Key changes:

Replaced slice-based eligible document tracking ([]uint64) with bitsets, reducing memory from 8N bytes to N/8 bytes per segment
Introduced iterator-based API for eligible documents that directly translates to bitset iteration at the storage layer
Fixed garbage creation in UnadornedPostingsIterator by reusing a single struct instance instead of allocating per Next() call
Optimized bytes read tracking to skip computation in no-scoring mode

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
index/scorch/snapshot_vector_index.go	Introduces bitset-based eligible document storage and iterator API, replacing the previous slice-based approach
index/scorch/unadorned.go	Changes `UnadornedPosting` from `uint64` to struct with pointer receivers and adds reusable struct fields to iterators to eliminate per-call allocations
index/scorch/snapshot_index_tfr.go	Adds conditional bytes read tracking via `updateBytesRead` flag to skip computation in no-scoring mode
index/scorch/snapshot_index.go	Initializes `updateBytesRead` flag based on scoring requirements
index/scorch/optimize_knn.go	Removes `requiresFiltering` flag and updates to use new `SegmentEligibleDocuments` API
index/scorch/optimize.go	Sets `updateBytesRead` to false for unadorned term field readers
index/scorch/snapshot_index_vr.go	Updates `InterpretVectorIndex` call to remove filtering parameter
index_test.go	Updates expected bytes read values to reflect the optimization that skips unnecessary computation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-28T06:39:03Z

index/scorch/snapshot_vector_index.go

+		return 0, false // no eligible documents
+	}
+	next, found := it.bs.NextSet(it.current)
+	if next >= it.max || !found {


The boundary check next >= it.max may be redundant. The found boolean returned by NextSet should be sufficient to determine if a valid set bit was found. The additional check next >= it.max appears to be a defensive check, but if NextSet returns found=true, the returned index should always be within valid bounds (0 to Len()-1). Consider whether this additional check is necessary, or document why it's included as a defensive measure.

Suggested change

if next >= it.max || !found {

if !found {

Copilot · 2025-12-28T06:39:04Z

index/scorch/snapshot_index_tfr.go

-			if bytesRead > prevBytesRead {
-				i.incrementBytesRead(bytesRead - prevBytesRead)
+			if i.updateBytesRead {
+				// postingsIterators is maintain the bytesRead stat in a cumulative fashion.


Grammar error in the comment: "postingsIterators is maintain" should be "postingsIterators maintains" or "the postingsIterators maintain".

Suggested change

// postingsIterators is maintain the bytesRead stat in a cumulative fashion.

// postingsIterators maintains the bytesRead stat in a cumulative fashion.

Copilot · 2025-12-28T06:39:04Z

index/scorch/snapshot_vector_index.go

+	return int(edl.bs.Count())
+}
+
+// emptyEligibleDocumentList is a reusable empty eligible document list.


The emptyEligibleDocumentList is being reused globally. While this is typically safe for read-only operations, each call to Iterator() will create a new iterator that references the same underlying nil bitset, which should be safe. However, for consistency and to avoid potential future issues if this struct gains mutable state, consider documenting that this is intentionally a shared singleton for performance reasons.

Suggested change

// emptyEligibleDocumentList is a reusable empty eligible document list.

// emptyEligibleDocumentList is a reusable empty eligible document list.

// It is intentionally defined as a shared singleton for performance reasons.

// The underlying eligibleDocumentList for this variable is immutable

// (its bitset is always nil), so it is safe to reuse this instance across

// goroutines and calls to Iterator(). If eligibleDocumentList gains mutable

// state in the future, this assumption must be revisited.

Copilot · 2025-12-28T06:39:04Z

index/scorch/snapshot_vector_index.go

+	return uint64(next), true
+}
+
+// emptyIterator is a reusable empty eligible document iterator.


The comment says "emptyIterator" but the variable name is "emptyEligibleIterator". Update the comment to match the actual variable name for consistency.

Suggested change

// emptyIterator is a reusable empty eligible document iterator.

// emptyEligibleIterator is a reusable empty eligible document iterator.

CascadingRadium added 6 commits December 25, 2025 01:26

minor opt

65b931b

remove redundant variable

f6bf3af

overhaul the eligible iterator for performance

ec24d99

fix bytes stat

27f2d9f

fix unadorned posting garbage

da11922

micro optimization

a76bdab

CascadingRadium added this to Vector Search v2 Dec 28, 2025

github-project-automation bot moved this to Todo in Vector Search v2 Dec 28, 2025

CascadingRadium moved this from Todo to Done in Vector Search v2 Dec 28, 2025

CascadingRadium requested review from Likith101, Thejas-bhat, abhinavdangeti, capemox, Copilot and maneuvertomars December 28, 2025 06:35

Copilot started reviewing on behalf of CascadingRadium December 28, 2025 06:35 View session

Copilot AI reviewed Dec 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MB-69881: Re-architect vector search #2270

MB-69881: Re-architect vector search #2270

Uh oh!

CascadingRadium commented Dec 28, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 28, 2025

Uh oh!

Copilot AI Dec 28, 2025

Uh oh!

Copilot AI Dec 28, 2025

Uh oh!

Copilot AI Dec 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	// postingsIterators is maintain the bytesRead stat in a cumulative fashion.
	// postingsIterators maintains the bytesRead stat in a cumulative fashion.

-// emptyEligibleDocumentList is a reusable empty eligible document list.
+// emptyEligibleDocumentList is a reusable empty eligible document list.
+// It is intentionally defined as a shared singleton for performance reasons.
+// The underlying eligibleDocumentList for this variable is immutable
+// (its bitset is always nil), so it is safe to reuse this instance across
+// goroutines and calls to Iterator(). If eligibleDocumentList gains mutable
+// state in the future, this assumption must be revisited.

	// emptyIterator is a reusable empty eligible document iterator.
	// emptyEligibleIterator is a reusable empty eligible document iterator.

MB-69881: Re-architect vector search #2270

Are you sure you want to change the base?

MB-69881: Re-architect vector search #2270

Uh oh!

Conversation

CascadingRadium commented Dec 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CascadingRadium commented Dec 28, 2025 •

edited

Loading