-
Notifications
You must be signed in to change notification settings - Fork 697
MB-69655: Fix vector normalization to handle multi-vectors correctly #2260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
2f2e259 to
4e3891f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes a critical bug in multi-vector normalization for cosine similarity searches. Previously, when indexing multi-vector fields (e.g., [[3,0,0], [0,4,0]]), the normalization was incorrectly applied to the entire flattened array rather than normalizing each sub-vector independently, resulting in incorrect similarity scores.
Key changes:
- Added
NormalizeMultiVector(vec, dims)function that normalizes each sub-vector separately for multi-vector fields - Updated normalization calls in
processVectorandprocessVectorBase64to use the new function - Refactored
NormalizeVectorto useslices.Cloneinstead of manual copying - Added comprehensive unit tests for multi-vector normalization with various edge cases
- Added end-to-end integration test verifying correct cosine similarity scores for multi-vector fields
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
mapping/mapping_vectors.go |
Implements NormalizeMultiVector to normalize each sub-vector independently; updates normalization calls to use new function; refactors NormalizeVector to use slices.Clone |
mapping/mapping_vectors_test.go |
Adds comprehensive unit tests for NormalizeMultiVector covering single vectors, multi-vectors, edge cases (empty, zero/negative dims), and helper functions for magnitude computation and float comparison |
search_knn_test.go |
Adds end-to-end integration test TestMultiVectorCosineNormalization verifying correct similarity scores (1.0 for exact matches) on both single-vector and multi-vector fields with cosine similarity |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The base branch was changed.
|
Changed the base branch to |
|
hey @abhinavdangeti, i had set the branch to |
|
Ah ok, let's merge this only after that goes in then. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
abhinavdangeti
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good 👍🏼 .
@CascadingRadium lets merge this one only after the knnDup branch is merged into master and the base branch here is changed to master - so we produce 2 separate commits for these two separate issues.
…2260) - When indexing multi-vector fields (e.g., `[[3,0,0], [0,4,0]]`) with `cosine` similarity, normalization was incorrectly applied to the entire flattened array instead of each sub-vector independently, resulting in degraded similarity scores. - Added `NormalizeMultiVector(vec, dims)` that normalizes each sub-vector separately, fixing scores for multi-vector documents (e.g., score now correctly returns 1.0 instead of 0.6 for exact matches).
…2260) - When indexing multi-vector fields (e.g., `[[3,0,0], [0,4,0]]`) with `cosine` similarity, normalization was incorrectly applied to the entire flattened array instead of each sub-vector independently, resulting in degraded similarity scores. - Added `NormalizeMultiVector(vec, dims)` that normalizes each sub-vector separately, fixing scores for multi-vector documents (e.g., score now correctly returns 1.0 instead of 0.6 for exact matches).
…2260) - When indexing multi-vector fields (e.g., `[[3,0,0], [0,4,0]]`) with `cosine` similarity, normalization was incorrectly applied to the entire flattened array instead of each sub-vector independently, resulting in degraded similarity scores. - Added `NormalizeMultiVector(vec, dims)` that normalizes each sub-vector separately, fixing scores for multi-vector documents (e.g., score now correctly returns 1.0 instead of 0.6 for exact matches).
…2264) Author: @CascadingRadium from #2260 - When indexing multi-vector fields (e.g., `[[3,0,0], [0,4,0]]`) with `cosine` similarity, normalization was incorrectly applied to the entire flattened array instead of each sub-vector independently, resulting in degraded similarity scores. - Added `NormalizeMultiVector(vec, dims)` that normalizes each sub-vector separately, fixing scores for multi-vector documents (e.g., score now correctly returns 1.0 instead of 0.6 for exact matches). --------- Co-authored-by: Rahul Rampure <rahul.rampure@couchbase.com>
…ctly (#2264) Author: @CascadingRadium from #2260 - When indexing multi-vector fields (e.g., `[[3,0,0], [0,4,0]]`) with `cosine` similarity, normalization was incorrectly applied to the entire flattened array instead of each sub-vector independently, resulting in degraded similarity scores. - Added `NormalizeMultiVector(vec, dims)` that normalizes each sub-vector separately, fixing scores for multi-vector documents (e.g., score now correctly returns 1.0 instead of 0.6 for exact matches). --------- Co-authored-by: Rahul Rampure <rahul.rampure@couchbase.com>
…ctly (#2264) Author: @CascadingRadium from #2260 - When indexing multi-vector fields (e.g., `[[3,0,0], [0,4,0]]`) with `cosine` similarity, normalization was incorrectly applied to the entire flattened array instead of each sub-vector independently, resulting in degraded similarity scores. - Added `NormalizeMultiVector(vec, dims)` that normalizes each sub-vector separately, fixing scores for multi-vector documents (e.g., score now correctly returns 1.0 instead of 0.6 for exact matches). --------- Co-authored-by: Rahul Rampure <rahul.rampure@couchbase.com>
[[3,0,0], [0,4,0]]) withcosinesimilarity, normalization was incorrectly applied to the entire flattened array instead of each sub-vector independently, resulting in degraded similarity scores.NormalizeMultiVector(vec, dims)that normalizes each sub-vector separately, fixing scores for multi-vector documents (e.g., score now correctly returns 1.0 instead of 0.6 for exact matches).