refactor: deprecate list_indices and migrate tests to describe_indices#5945
Conversation
|
@westonpace, could you please review this when you have a chance? Thanks! |
westonpace
left a comment
There was a problem hiding this comment.
I think this will introduce some downstream work in lancedb but hopefully it won't be too bad.
I see no issue with the PR itself. I have some slight concern whether we are ready to completely abandon index_stats or not.
I think the main thing missing in describe_indices is the ability to figure out what type of vector index an index has (e.g. flat or hnsw, ivf, pq or sq, etc.) It would be nice if we wrote that information in the vector index details.
Still, for the places where users were relying on the type they can still use the deprecated method for now I think.
Interested in others opinions (CC @wjones127 @wkalt @BubbleCal)
| for idx in indices: | ||
| if field_id in idx.fields: | ||
| # Use index_stats to get the concrete IVF subtype. | ||
| index_type = self.stats.index_stats(idx.name).get("index_type", "") |
There was a problem hiding this comment.
I think index_stats can be an expensive call but maybe not. I wonder if we could further refine this to only consider columns that have a list or fixed-size-list data type? Either way though, this is an improvement over the past behavior so I think it's ok.
Adding |
Yes, this would probably be best done in a follow-up.
Yes. Anything that isn't too large. It goes in the manifest and we don't want the manifest to get too big but 1-2KB per index should be acceptable (1-2KB isn't a formal guideline, just something off the top of my head). We can easily fit the distance_type and dimension in (and number of partitions and subvectors). I would avoid things like the IVF centroids or PQ codebook. |
Created follow‑up issue #5963. I’ll follow up when I have time. |
|
Thanks again! |
closes #5237
Deprecate
list_indicesand switch tests/benchmarks todescribe_indices/index_stats.