MB-69655: Fix vector normalization to handle multi-vectors correctly #2260

CascadingRadium · 2025-12-04T00:19:33Z

When indexing multi-vector fields (e.g., [[3,0,0], [0,4,0]]) with cosine similarity, normalization was incorrectly applied to the entire flattened array instead of each sub-vector independently, resulting in degraded similarity scores.
Added NormalizeMultiVector(vec, dims) that normalizes each sub-vector separately, fixing scores for multi-vector documents (e.g., score now correctly returns 1.0 instead of 0.6 for exact matches).

Copilot

Pull request overview

This PR fixes a critical bug in multi-vector normalization for cosine similarity searches. Previously, when indexing multi-vector fields (e.g., [[3,0,0], [0,4,0]]), the normalization was incorrectly applied to the entire flattened array rather than normalizing each sub-vector independently, resulting in incorrect similarity scores.

Key changes:

Added NormalizeMultiVector(vec, dims) function that normalizes each sub-vector separately for multi-vector fields
Updated normalization calls in processVector and processVectorBase64 to use the new function
Refactored NormalizeVector to use slices.Clone instead of manual copying
Added comprehensive unit tests for multi-vector normalization with various edge cases
Added end-to-end integration test verifying correct cosine similarity scores for multi-vector fields

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
`mapping/mapping_vectors.go`	Implements `NormalizeMultiVector` to normalize each sub-vector independently; updates normalization calls to use new function; refactors `NormalizeVector` to use `slices.Clone`
`mapping/mapping_vectors_test.go`	Adds comprehensive unit tests for `NormalizeMultiVector` covering single vectors, multi-vectors, edge cases (empty, zero/negative dims), and helper functions for magnitude computation and float comparison
`search_knn_test.go`	Adds end-to-end integration test `TestMultiVectorCosineNormalization` verifying correct similarity scores (1.0 for exact matches) on both single-vector and multi-vector fields with cosine similarity

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mapping/mapping_vectors.go

The base branch was changed.

abhinavdangeti · 2025-12-04T14:55:33Z

Changed the base branch to master - so we can back port these commits easily.
@CascadingRadium would you resolve the merge conflicts.

CascadingRadium · 2025-12-04T16:33:56Z

hey @abhinavdangeti, i had set the branch to knnDup as the test i have added here will consistently fail without the patch in the knnDup branch

abhinavdangeti · 2025-12-04T16:41:03Z

Ah ok, let's merge this only after that goes in then.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mapping/mapping_vectors.go

abhinavdangeti

Looks good 👍🏼 .
@CascadingRadium lets merge this one only after the knnDup branch is merged into master and the base branch here is changed to master - so we produce 2 separate commits for these two separate issues.

…rrectly (#2260)" This reverts commit a233b67.

…2260) - When indexing multi-vector fields (e.g., `[[3,0,0], [0,4,0]]`) with `cosine` similarity, normalization was incorrectly applied to the entire flattened array instead of each sub-vector independently, resulting in degraded similarity scores. - Added `NormalizeMultiVector(vec, dims)` that normalizes each sub-vector separately, fixing scores for multi-vector documents (e.g., score now correctly returns 1.0 instead of 0.6 for exact matches).

@CascadingRadium

…2264) Author: @CascadingRadium from #2260 - When indexing multi-vector fields (e.g., `[[3,0,0], [0,4,0]]`) with `cosine` similarity, normalization was incorrectly applied to the entire flattened array instead of each sub-vector independently, resulting in degraded similarity scores. - Added `NormalizeMultiVector(vec, dims)` that normalizes each sub-vector separately, fixing scores for multi-vector documents (e.g., score now correctly returns 1.0 instead of 0.6 for exact matches). --------- Co-authored-by: Rahul Rampure <rahul.rampure@couchbase.com>

@CascadingRadium

…ctly (#2264) Author: @CascadingRadium from #2260 - When indexing multi-vector fields (e.g., `[[3,0,0], [0,4,0]]`) with `cosine` similarity, normalization was incorrectly applied to the entire flattened array instead of each sub-vector independently, resulting in degraded similarity scores. - Added `NormalizeMultiVector(vec, dims)` that normalizes each sub-vector separately, fixing scores for multi-vector documents (e.g., score now correctly returns 1.0 instead of 0.6 for exact matches). --------- Co-authored-by: Rahul Rampure <rahul.rampure@couchbase.com>

@CascadingRadium

…ctly (#2264) Author: @CascadingRadium from #2260 - When indexing multi-vector fields (e.g., `[[3,0,0], [0,4,0]]`) with `cosine` similarity, normalization was incorrectly applied to the entire flattened array instead of each sub-vector independently, resulting in degraded similarity scores. - Added `NormalizeMultiVector(vec, dims)` that normalizes each sub-vector separately, fixing scores for multi-vector documents (e.g., score now correctly returns 1.0 instead of 0.6 for exact matches). --------- Co-authored-by: Rahul Rampure <rahul.rampure@couchbase.com>

CascadingRadium added 4 commits December 3, 2025 17:15

Fix duplicate results when performing KNN search

35ba4ff

code review

3b5d30c

Fix vector normalization to handle multi-vectors correctly

781335b

UT

4e3891f

CascadingRadium force-pushed the cosineFix branch from 2f2e259 to 4e3891f Compare December 4, 2025 01:12

CascadingRadium changed the base branch from master to knnDup December 4, 2025 01:12

merge conflict

9dae832

CascadingRadium requested review from Likith101, Thejas-bhat, abhinavdangeti, capemox, Copilot and maneuvertomars December 4, 2025 01:13

Copilot started reviewing on behalf of CascadingRadium December 4, 2025 01:14 View session

Copilot finished reviewing on behalf of CascadingRadium December 4, 2025 01:15

Copilot AI reviewed Dec 4, 2025

View reviewed changes

mapping/mapping_vectors.go Show resolved Hide resolved

abhinavdangeti added this to the v2.5.7 milestone Dec 4, 2025

Likith101 previously approved these changes Dec 4, 2025

View reviewed changes

abhinavdangeti changed the base branch from knnDup to master December 4, 2025 14:54

abhinavdangeti changed the base branch from master to knnDup December 4, 2025 16:42

CascadingRadium added 3 commits December 5, 2025 14:51

Merge branch 'knnDup' into cosineFix

9ac8392

use normalizeVector for base64

68760c2

fix merge conflict

99e2120

CascadingRadium requested review from Likith101 and Copilot December 5, 2025 09:37

Copilot started reviewing on behalf of CascadingRadium December 5, 2025 09:37 View session

Copilot finished reviewing on behalf of CascadingRadium December 5, 2025 09:39

Copilot AI reviewed Dec 5, 2025

View reviewed changes

mapping/mapping_vectors.go Show resolved Hide resolved

abhinavdangeti reviewed Dec 5, 2025

View reviewed changes

mapping/mapping_vectors.go Show resolved Hide resolved

abhinavdangeti approved these changes Dec 5, 2025

View reviewed changes

capemox approved these changes Dec 8, 2025

View reviewed changes

CascadingRadium merged commit a233b67 into knnDup Dec 8, 2025
15 checks passed

CascadingRadium deleted the cosineFix branch December 8, 2025 06:18

abhinavdangeti restored the cosineFix branch December 8, 2025 15:01

abhinavdangeti added a commit that referenced this pull request Dec 8, 2025

Revert "MB-69655: Fix vector normalization to handle multi-vectors co…

fc32bb8

…rrectly (#2260)" This reverts commit a233b67.

abhinavdangeti mentioned this pull request Dec 8, 2025

MB-69655: Fix vector normalization to handle multi-vectors correctly #2264

Merged

abhinavdangeti removed this from the v2.5.7 milestone Dec 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MB-69655: Fix vector normalization to handle multi-vectors correctly #2260

MB-69655: Fix vector normalization to handle multi-vectors correctly #2260

Uh oh!

CascadingRadium commented Dec 4, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

abhinavdangeti commented Dec 4, 2025

Uh oh!

CascadingRadium commented Dec 4, 2025

Uh oh!

abhinavdangeti commented Dec 4, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

abhinavdangeti left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

MB-69655: Fix vector normalization to handle multi-vectors correctly #2260

MB-69655: Fix vector normalization to handle multi-vectors correctly #2260

Uh oh!

Conversation

CascadingRadium commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

abhinavdangeti commented Dec 4, 2025

Uh oh!

CascadingRadium commented Dec 4, 2025

Uh oh!

abhinavdangeti commented Dec 4, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

abhinavdangeti left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

CascadingRadium commented Dec 4, 2025 •

edited

Loading