Skip to content

fix: vector index type shown as unknown in describe_indices#6122

Merged
Xuanwo merged 3 commits intolance-format:mainfrom
jackye1995:fix-ivf-flat-index-type-unknown
Mar 9, 2026
Merged

fix: vector index type shown as unknown in describe_indices#6122
Xuanwo merged 3 commits intolance-format:mainfrom
jackye1995:fix-ivf-flat-index-type-unknown

Conversation

@jackye1995
Copy link
Copy Markdown
Contributor

@jackye1995 jackye1995 commented Mar 7, 2026

Vector index still needs to be opened to get the right type, otherwise it is shown as unknown.

@github-actions github-actions Bot added the bug Something isn't working label Mar 7, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 7, 2026

PR Review

Summary: Opens vector indices to determine their concrete type (e.g., IVF_PQ, IVF_HNSW_SQ) instead of showing "Unknown" in describe_indices. The root cause is that VectorIndexDetails in the protobuf doesn't store the concrete subtype, so the plugin registry lookup fails.

Performance concern (P1)

describe_indices now performs I/O (open_generic_index) sequentially for every vector index. This opens each index file from object storage to read its type. For datasets with many vector indices, especially on remote stores (S3/GCS/Azure), this could add significant latency.

Consider:

  1. Running the open_generic_index calls concurrently (e.g., futures::future::try_join_all or FuturesUnordered) rather than the sequential for loop.
  2. Alternatively, caching the index type in the manifest/metadata so this I/O isn't needed at all — though that's a larger change.

Missing tests (P1)

Per repo policy: "All bugfixes and features must have corresponding tests." There are no tests added for this change. A test that creates a vector index (e.g., IVF_PQ) and asserts describe_indices returns the correct index_type string would be valuable and would prevent regressions.

Minor: variable shadowing replaced but field_ids still used after rename

The rename from field_ids to field_ids_vec is fine, but note that in the new code block, field_ids (the original &example_metadata.fields reference on line 444) is still used to look up the column name via field_ids.first(). This works correctly since field_ids still refers to the original &Vec<i32> — just noting it's a bit subtle that both field_ids and field_ids_vec coexist. No bug here.

Copy link
Copy Markdown
Contributor

@justinrmiller justinrmiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, my only concern is if the collect gets too large, but that also applies to the previous version.

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 7, 2026

Codecov Report

❌ Patch coverage is 86.44068% with 8 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance/src/index.rs 86.44% 7 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

@github-actions github-actions Bot added the python label Mar 7, 2026
@wjones127
Copy link
Copy Markdown
Contributor

FYI this will also be fixed by #6099, but that will take longer to merge.

@jackye1995
Copy link
Copy Markdown
Contributor Author

FYI this will also be fixed by #6099, but that will take longer to merge.

Nice, totally agree that's the right way. I guess the question is do we think this is a regression so we should have this in place first. But if we are not patching it, then it does not really make much sense

@Xuanwo Xuanwo merged commit 7eed0db into lance-format:main Mar 9, 2026
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants