fix!: check metric compatibility before using vector index#5609
fix!: check metric compatibility before using vector index#5609wjones127 merged 5 commits intolance-format:mainfrom
Conversation
Previously, vector search would use an ANN index regardless of whether the index's metric type matched the query's requested metric. This produced incorrect distances when, for example, an index built with metric="dot" was used for a query with metric="l2". Now the scanner checks if the index's metric matches the user's requested metric. If they don't match, it silently falls back to flat search. If the user doesn't specify a metric, the index's metric is used. Changes: - Query.metric_type is now Option<DistanceType> (None = use index default) - Scanner checks metric compatibility before using an index - Explain plan now shows the metric being used - Java bindings updated to make distanceType optional Fixes lance-format#5608 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Code ReviewSummary: This PR fixes a critical bug (#5608) where vector search incorrectly used the ANN index regardless of the query's metric type, returning wrong distances. The fix changes No Major Issues FoundThe implementation is correct and well-tested. The logic properly:
Minor Observations (Non-blocking)1. Double index open in scanner.rs (lines ~3037-3066) The index is opened to check metric compatibility, but then 2. Java API backward compatibility The change from required new Query.Builder().setKey(...).build() // Used L2Now uses index default or data-type default. This is the intended behavior per the fix, but downstream Java users relying on implicit L2 may see different results if their index uses a different metric. This seems acceptable for a bug fix, but worth noting in release notes. TestsGood test coverage with LGTM |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
When a user specifies a metric that is incompatible with the data type (e.g., L2 on binary vectors), use the index with its own metric rather than falling back to flat search which would fail. The logic now is: - If metrics match: use the index - If user metric is incompatible with data type: use the index - If user metric is compatible but different from index: flat search 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This reverts commit 0a8d618.
rpgreen
left a comment
There was a problem hiding this comment.
While this is a bugfix, it could be a surprising behavior change. Should it be flagged as a breaking change?
…mat#5609) BREAKING CHANGE: `Query.metric_type` is now `Option<DistanceType>` instead of `DistanceType` and the default is `None`. This means that it will default to whatever the vector index distance type is, or L2 (the old default if there is no index built). The other breaking change is that if you explicitly pass a distance type but the index was built on a different distance, it will now fall back to flat search instead of overriding the distance type. ## Summary - Fix vector search using ANN index regardless of query's metric type - Query.metric_type is now `Option<DistanceType>` (None = use index default) - If user specifies a metric that doesn't match the index, fall back to flat search - Explain plan now shows the metric being used ## Test plan - [x] Added regression test `test_knn_metric_mismatch_falls_back_to_flat_search` - [x] Added test `test_knn_no_metric_uses_index_metric` - [x] Existing tests pass with updated explain plan expectations Fixes lance-format#5608 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
BREAKING CHANGE:
Query.metric_typeis nowOption<DistanceType>instead ofDistanceTypeand the default isNone. This means that it will default to whatever the vector index distance type is, or L2 (the old default if there is no index built). The other breaking change is that if you explicitly pass a distance type but the index was built on a different distance, it will now fall back to flat search instead of overriding the distance type.Summary
Option<DistanceType>(None = use index default)Test plan
test_knn_metric_mismatch_falls_back_to_flat_searchtest_knn_no_metric_uses_index_metricFixes #5608
🤖 Generated with Claude Code