Skip to content

Conversation

@CascadingRadium
Copy link
Member

@CascadingRadium CascadingRadium commented Dec 4, 2025

  • When indexing multi-vector fields (e.g., [[3,0,0], [0,4,0]]) with cosine similarity, normalization was incorrectly applied to the entire flattened array instead of each sub-vector independently, resulting in degraded similarity scores.
  • Added NormalizeMultiVector(vec, dims) that normalizes each sub-vector separately, fixing scores for multi-vector documents (e.g., score now correctly returns 1.0 instead of 0.6 for exact matches).

@CascadingRadium CascadingRadium changed the base branch from master to knnDup December 4, 2025 01:12
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a critical bug in multi-vector normalization for cosine similarity searches. Previously, when indexing multi-vector fields (e.g., [[3,0,0], [0,4,0]]), the normalization was incorrectly applied to the entire flattened array rather than normalizing each sub-vector independently, resulting in incorrect similarity scores.

Key changes:

  • Added NormalizeMultiVector(vec, dims) function that normalizes each sub-vector separately for multi-vector fields
  • Updated normalization calls in processVector and processVectorBase64 to use the new function
  • Refactored NormalizeVector to use slices.Clone instead of manual copying
  • Added comprehensive unit tests for multi-vector normalization with various edge cases
  • Added end-to-end integration test verifying correct cosine similarity scores for multi-vector fields

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
mapping/mapping_vectors.go Implements NormalizeMultiVector to normalize each sub-vector independently; updates normalization calls to use new function; refactors NormalizeVector to use slices.Clone
mapping/mapping_vectors_test.go Adds comprehensive unit tests for NormalizeMultiVector covering single vectors, multi-vectors, edge cases (empty, zero/negative dims), and helper functions for magnitude computation and float comparison
search_knn_test.go Adds end-to-end integration test TestMultiVectorCosineNormalization verifying correct similarity scores (1.0 for exact matches) on both single-vector and multi-vector fields with cosine similarity

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@abhinavdangeti abhinavdangeti added this to the v2.5.7 milestone Dec 4, 2025
Likith101
Likith101 previously approved these changes Dec 4, 2025
@abhinavdangeti abhinavdangeti changed the base branch from knnDup to master December 4, 2025 14:54
@abhinavdangeti abhinavdangeti dismissed Likith101’s stale review December 4, 2025 14:54

The base branch was changed.

@abhinavdangeti
Copy link
Member

Changed the base branch to master - so we can back port these commits easily.
@CascadingRadium would you resolve the merge conflicts.

@CascadingRadium
Copy link
Member Author

hey @abhinavdangeti, i had set the branch to knnDup as the test i have added here will consistently fail without the patch in the knnDup branch

@abhinavdangeti
Copy link
Member

Ah ok, let's merge this only after that goes in then.

@abhinavdangeti abhinavdangeti changed the base branch from master to knnDup December 4, 2025 16:42
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Member

@abhinavdangeti abhinavdangeti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍🏼 .
@CascadingRadium lets merge this one only after the knnDup branch is merged into master and the base branch here is changed to master - so we produce 2 separate commits for these two separate issues.

@CascadingRadium CascadingRadium merged commit a233b67 into knnDup Dec 8, 2025
15 checks passed
@CascadingRadium CascadingRadium deleted the cosineFix branch December 8, 2025 06:18
@abhinavdangeti abhinavdangeti restored the cosineFix branch December 8, 2025 15:01
abhinavdangeti added a commit that referenced this pull request Dec 8, 2025
abhinavdangeti pushed a commit that referenced this pull request Dec 8, 2025
…2260)

- When indexing multi-vector fields (e.g., `[[3,0,0], [0,4,0]]`) with
`cosine` similarity, normalization was incorrectly applied to the entire
flattened array instead of each sub-vector independently, resulting in
degraded similarity scores.
- Added `NormalizeMultiVector(vec, dims)` that normalizes each
sub-vector separately, fixing scores for multi-vector documents (e.g.,
score now correctly returns 1.0 instead of 0.6 for exact matches).
abhinavdangeti pushed a commit that referenced this pull request Dec 9, 2025
…2260)

- When indexing multi-vector fields (e.g., `[[3,0,0], [0,4,0]]`) with
`cosine` similarity, normalization was incorrectly applied to the entire
flattened array instead of each sub-vector independently, resulting in
degraded similarity scores.
- Added `NormalizeMultiVector(vec, dims)` that normalizes each
sub-vector separately, fixing scores for multi-vector documents (e.g.,
score now correctly returns 1.0 instead of 0.6 for exact matches).
@abhinavdangeti abhinavdangeti removed this from the v2.5.7 milestone Dec 12, 2025
CascadingRadium added a commit that referenced this pull request Dec 15, 2025
…2260)

- When indexing multi-vector fields (e.g., `[[3,0,0], [0,4,0]]`) with
`cosine` similarity, normalization was incorrectly applied to the entire
flattened array instead of each sub-vector independently, resulting in
degraded similarity scores.
- Added `NormalizeMultiVector(vec, dims)` that normalizes each
sub-vector separately, fixing scores for multi-vector documents (e.g.,
score now correctly returns 1.0 instead of 0.6 for exact matches).
CascadingRadium added a commit that referenced this pull request Dec 15, 2025
…2264)

Author: @CascadingRadium from
#2260

- When indexing multi-vector fields (e.g., `[[3,0,0], [0,4,0]]`) with
`cosine` similarity, normalization was incorrectly applied to the entire
flattened array instead of each sub-vector independently, resulting in
degraded similarity scores.
- Added `NormalizeMultiVector(vec, dims)` that normalizes each
sub-vector separately, fixing scores for multi-vector documents (e.g.,
score now correctly returns 1.0 instead of 0.6 for exact matches).

---------

Co-authored-by: Rahul Rampure <rahul.rampure@couchbase.com>
abhinavdangeti added a commit that referenced this pull request Dec 15, 2025
…ctly (#2264)

Author: @CascadingRadium from
#2260

- When indexing multi-vector fields (e.g., `[[3,0,0], [0,4,0]]`) with
`cosine` similarity, normalization was incorrectly applied to the entire
flattened array instead of each sub-vector independently, resulting in
degraded similarity scores.
- Added `NormalizeMultiVector(vec, dims)` that normalizes each
sub-vector separately, fixing scores for multi-vector documents (e.g.,
score now correctly returns 1.0 instead of 0.6 for exact matches).

---------

Co-authored-by: Rahul Rampure <rahul.rampure@couchbase.com>
abhinavdangeti added a commit that referenced this pull request Dec 15, 2025
…ctly (#2264)

Author: @CascadingRadium from
#2260

- When indexing multi-vector fields (e.g., `[[3,0,0], [0,4,0]]`) with
`cosine` similarity, normalization was incorrectly applied to the entire
flattened array instead of each sub-vector independently, resulting in
degraded similarity scores.
- Added `NormalizeMultiVector(vec, dims)` that normalizes each
sub-vector separately, fixing scores for multi-vector documents (e.g.,
score now correctly returns 1.0 instead of 0.6 for exact matches).

---------

Co-authored-by: Rahul Rampure <rahul.rampure@couchbase.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants