Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
af877d5
draft vector index details
wjones127 Mar 4, 2026
080ac46
populate and surface VectorIndexDetails through describe_indices
wjones127 Mar 4, 2026
eb9afee
use Serialize structs for vector details JSON, add snapshot tests
wjones127 Mar 4, 2026
36fd159
remove flat variant
wjones127 Mar 4, 2026
822b6aa
infer vector index details eagerly on load and migration
wjones127 Mar 5, 2026
6269205
add test for legacy vector index details inference and migration
wjones127 Mar 5, 2026
d507027
fix proto field numbering gap, extract inference helper
wjones127 Mar 5, 2026
ca6251f
Merge remote-tracking branch 'upstream/main' into feat/vector-index-d…
wjones127 Mar 5, 2026
0a8e394
wip: cleanup
wjones127 Mar 5, 2026
dd70fee
fix python test
wjones127 Mar 5, 2026
45b9995
move VectorIndexDetails and HnswIndexDetails from table.proto to inde…
wjones127 Mar 5, 2026
aa10e5e
fix review issues: inverted comment, log->tracing, append carry-forwa…
wjones127 Mar 5, 2026
e88ed2f
enhance tests
wjones127 Mar 16, 2026
7c986a9
Merge branch 'main' into feat/vector-index-details
wjones127 Mar 16, 2026
5e2deaf
Merge remote-tracking branch 'upstream/main' into feat/vector-index-d…
wjones127 Mar 16, 2026
2cdfbd2
feat: read metric type from index metadata without I/O
wjones127 Mar 16, 2026
d361895
address PR review feedback for vector index details
wjones127 Mar 17, 2026
eccdfa8
Merge branch 'feat/vector-index-details' of github.com:wjones127/lanc…
wjones127 Mar 23, 2026
30f814a
Merge remote-tracking branch 'upstream/main' into feat/vector-index-d…
wjones127 Mar 23, 2026
4104b25
feat: add max_level to HnswParameters proto
wjones127 Mar 23, 2026
b8a99fb
fix: update Python binding to use VectorIndexDetails from lance_index
wjones127 Mar 23, 2026
1fa9c17
feat: add runtime_hints map to VectorIndexDetails
wjones127 Apr 7, 2026
95c616a
chore: merge upstream/main into feat/vector-index-details
wjones127 Apr 7, 2026
e9bb5a5
feat: add vector_params_from_details and wire skip_transpose into reb…
wjones127 Apr 7, 2026
11bc126
clippy
wjones127 Apr 7, 2026
da888fb
fix: wire max_iters kwarg and fix binding compile errors
wjones127 Apr 8, 2026
5702f43
clippy
wjones127 Apr 8, 2026
c3860e6
fix compat tests
wjones127 Apr 8, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions java/lance-jni/Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions java/lance-jni/src/utils.rs
Original file line number Diff line number Diff line change
Expand Up @@ -426,6 +426,7 @@ pub fn get_vector_index_params(
stages,
version: IndexFileVersion::V3,
skip_transpose: false,
runtime_hints: Default::default(),
})
},
)?;
Expand Down
61 changes: 61 additions & 0 deletions protos/index.proto
Original file line number Diff line number Diff line change
Expand Up @@ -184,6 +184,67 @@ message VectorIndex {
VectorMetricType metric_type = 4;
}

// Details for vector indexes, stored in the manifest's index_details field.
message VectorIndexDetails {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are the IVF details? was it hierarchical? How many partitions in each stage?.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using VectorIndexDetails here for the IVF stage. I suppose I could make it a substruct to make it clearer. The only parameter right now is target_partition_size.

How many partitions in each stage?

What do you mean by this?

VectorMetricType metric_type = 1;

// The target number of vectors per partition.
// 0 means unset.
uint64 target_partition_size = 2;

// Optional HNSW index configuration. If set, the index has an HNSW layer.
optional HnswParameters hnsw_index_config = 3;

message ProductQuantization {
uint32 num_bits = 1;
uint32 num_sub_vectors = 2;
}
message ScalarQuantization {
uint32 num_bits = 1;
}
message RabitQuantization {
enum RotationType {
FAST = 0;
MATRIX = 1;
}
uint32 num_bits = 1;
RotationType rotation_type = 2;
}

// No quantization; vectors are stored as-is.
message FlatCompression {}

oneof compression {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe worth adding index_version for each one.
Found it's complicated for vector index to maintain compatibility when introducing breaking changes because it's hard to get the index version.

The other scalar index has index_version already today.

ProductQuantization pq = 4;
Comment thread
wjones127 marked this conversation as resolved.
ScalarQuantization sq = 5;
RabitQuantization rq = 6;
FlatCompression flat = 8;
}

// The version of the index file format. Useful for maintaining backwards
// compatibility when introducing breaking changes to the index format.
// 0 means unset (legacy index).
uint32 index_version = 7;

// Runtime hints: optional build preferences that don't affect index structure.
// Keys use reverse-DNS namespacing (e.g., "lance.ivf.max_iters", "lancedb.accelerator").
// Unrecognized keys must be silently ignored by all runtimes.
map<string, string> runtime_hints = 9;
}

// Hierarchical Navigable Small World (HNSW) parameters, used as an optional configuration for IVF indexes.
message HnswParameters {
// The maximum number of outgoing edges per node in the HNSW graph. Higher values
// means more connections, better recall, but more memory and slower builds.
// Referred to as "M" in the HNSW literature.
uint32 max_connections = 1;
// "construction exploration factor": The size of the dynamic list used during
// index construction.
uint32 construction_ef = 2;
Comment thread
wjones127 marked this conversation as resolved.
// The maximum number of levels in the HNSW graph.
uint32 max_level = 3;
}

message JsonIndexDetails {
string path = 1;
google.protobuf.Any target_details = 2;
Expand Down
3 changes: 1 addition & 2 deletions protos/table.proto
Original file line number Diff line number Diff line change
Expand Up @@ -474,8 +474,7 @@ message ExternalFile {
uint64 size = 3;
}

// Empty details messages for older indexes that don't take advantage of the details field.
message VectorIndexDetails {}
// VectorIndexDetails and HnswParameters (formerly HnswIndexDetails) moved to index.proto

message FragmentReuseIndexDetails {

Expand Down
51 changes: 49 additions & 2 deletions python/python/tests/compat/test_vector_indices.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,17 @@ def check_read(self):
)
assert result.num_rows == 4

if hasattr(ds, "describe_indices"):
indices = ds.describe_indices()
assert len(indices) >= 1
name = indices[0].name
elif self.compat_version >= "0.39.0":
indices = ds.list_indices()
assert len(indices) >= 1
name = indices[0]["name"]
stats = ds.stats.index_stats(name)
assert stats["num_indexed_rows"] > 0

def check_write(self):
"""Verify can insert vectors and rebuild index."""
ds = lance.dataset(self.path)
Expand Down Expand Up @@ -140,6 +151,18 @@ def check_read(self):
)
assert result.num_rows == 4

if hasattr(ds, "describe_indices"):
indices = ds.describe_indices()
assert len(indices) >= 1
name = indices[0].name
else:
indices = ds.list_indices()
assert len(indices) >= 1
name = indices[0]["name"]

stats = ds.stats.index_stats(name)
assert stats["num_indexed_rows"] > 0

def check_write(self):
"""Verify can insert vectors and rebuild index."""
ds = lance.dataset(self.path)
Expand Down Expand Up @@ -209,6 +232,18 @@ def check_read(self):
)
assert result.num_rows == 4

if hasattr(ds, "describe_indices"):
indices = ds.describe_indices()
assert len(indices) >= 1
name = indices[0].name
else:
indices = ds.list_indices()
assert len(indices) >= 1
name = indices[0]["name"]

stats = ds.stats.index_stats(name)
assert stats["num_indexed_rows"] > 0

def check_write(self):
"""Verify can insert vectors and rebuild index."""
ds = lance.dataset(self.path)
Expand All @@ -226,9 +261,9 @@ def check_write(self):
ds.optimize.compact_files()


@compat_test(min_version="0.39.0")
@compat_test(min_version="4.0.0-beta.8")
class IvfRqVectorIndex(UpgradeDowngradeTest):
"""Test IVF_RQ vector index compatibility."""
"""Test IVF_RQ vector index compatibility. V2 was introduced in v4.0.0-beta.8"""

def __init__(self, path: Path):
self.path = path
Expand Down Expand Up @@ -273,6 +308,18 @@ def check_read(self):
)
assert result.num_rows == 4

if hasattr(ds, "describe_indices"):
indices = ds.describe_indices()
assert len(indices) >= 1
name = indices[0].name
else:
indices = ds.list_indices()
assert len(indices) >= 1
name = indices[0].name

stats = ds.stats.index_stats(name)
assert stats["num_indexed_rows"] > 0

def check_write(self):
"""Verify can insert vectors and run optimize workflows."""
ds = lance.dataset(self.path)
Expand Down
40 changes: 39 additions & 1 deletion python/python/tests/test_vector_index.py
Original file line number Diff line number Diff line change
Expand Up @@ -1677,7 +1677,7 @@ def test_describe_vector_index(indexed_dataset: LanceDataset):
info = indexed_dataset.describe_indices()[0]

assert info.name == "vector_idx"
assert info.type_url == "/lance.table.VectorIndexDetails"
assert info.type_url == "/lance.index.pb.VectorIndexDetails"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regrettably, I do not think we can change the type URL for backwards compatibility reasons.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can move that back if it's necessary.

assert info.index_type == "IVF_PQ"
assert info.num_rows_indexed == 1000
assert info.fields == [0]
Expand All @@ -1688,6 +1688,44 @@ def test_describe_vector_index(indexed_dataset: LanceDataset):
assert info.segments[0].index_version == 1
assert info.segments[0].created_at is not None

details = info.details
assert details["metric_type"] == "L2"
assert details["compression"]["type"] == "pq"
assert details["compression"]["num_bits"] == 8
assert details["compression"]["num_sub_vectors"] == 16


def test_describe_index_runtime_hints_stored(tmp_path):
tbl = create_table(nvec=300, ndim=16)
dataset = lance.write_dataset(tbl, tmp_path)
dataset = dataset.create_index(
"vector",
index_type="IVF_PQ",
num_partitions=4,
num_sub_vectors=4,
max_iters=100,
sample_rate=512,
)
details = dataset.describe_indices()[0].details
hints = details.get("runtime_hints", {})
assert hints.get("lance.ivf.max_iters") == "100"
assert hints.get("lance.ivf.sample_rate") == "512"
assert hints.get("lance.pq.max_iters") == "100"
assert hints.get("lance.pq.sample_rate") == "512"


def test_describe_index_runtime_hints_defaults_omitted(tmp_path):
tbl = create_table(nvec=300, ndim=16)
dataset = lance.write_dataset(tbl, tmp_path)
dataset = dataset.create_index(
"vector",
index_type="IVF_PQ",
num_partitions=4,
num_sub_vectors=4,
)
details = dataset.describe_indices()[0].details
assert "runtime_hints" not in details


def test_optimize_indices(indexed_dataset):
data = create_table()
Expand Down
13 changes: 13 additions & 0 deletions python/src/dataset.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3569,6 +3569,12 @@ fn prepare_vector_index_params(
sq_params.sample_rate = sample_rate;
}

if let Some(max_iters) = kwargs.get_item("max_iters")? {
let max_iters: usize = max_iters.extract()?;
ivf_params.max_iters = max_iters;
pq_params.max_iters = max_iters;
}

// Parse IVF params
if let Some(n) = kwargs.get_item("num_partitions")? {
ivf_params.num_partitions = Some(n.extract()?)
Expand Down Expand Up @@ -3731,6 +3737,13 @@ fn prepare_vector_index_params(
}?;
params.version(index_file_version);
params.skip_transpose(skip_transpose);
if let Some(kwargs) = kwargs
&& let Some(acc) = kwargs.get_item("accelerator")?
{
params
.runtime_hints
.insert("lancedb.accelerator".to_string(), acc.to_string());
}
Ok(params)
}

Expand Down
3 changes: 1 addition & 2 deletions python/src/indices.rs
Original file line number Diff line number Diff line change
Expand Up @@ -441,8 +441,7 @@ async fn do_load_shuffled_vectors(
dataset_version: ds.manifest.version,
fragment_bitmap: Some(ds.fragments().iter().map(|f| f.id as u32).collect()),
index_details: Some(Arc::new(
prost_types::Any::from_msg(&lance_table::format::pb::VectorIndexDetails::default())
.unwrap(),
prost_types::Any::from_msg(&lance_index::pb::VectorIndexDetails::default()).unwrap(),
)),
index_version: IndexType::IvfPq.version(),
created_at: Some(Utc::now()),
Expand Down
14 changes: 14 additions & 0 deletions rust/lance-index/src/vector/bq.rs
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ use std::iter::once;
use std::str::FromStr;
use std::sync::Arc;

use crate::pb::vector_index_details::RabitQuantization;
use arrow_array::types::Float32Type;
use arrow_array::{Array, ArrayRef, UInt8Array, cast::AsArray};
use lance_core::{Error, Result};
Expand Down Expand Up @@ -121,6 +122,19 @@ impl RQBuildParams {
}
}

impl From<&RQBuildParams> for RabitQuantization {
fn from(value: &RQBuildParams) -> Self {
use crate::pb::vector_index_details::rabit_quantization::RotationType;
Self {
num_bits: value.num_bits as u32,
rotation_type: match value.rotation_type {
RQRotationType::Fast => RotationType::Fast as i32,
RQRotationType::Matrix => RotationType::Matrix as i32,
},
}
}
}

impl QuantizerBuildParams for RQBuildParams {
fn sample_size(&self) -> usize {
0
Expand Down
10 changes: 10 additions & 0 deletions rust/lance-index/src/vector/hnsw/builder.rs
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,16 @@ pub struct HnswBuildParams {
pub prefetch_distance: Option<usize>,
}

impl From<&HnswBuildParams> for crate::pb::HnswParameters {
fn from(params: &HnswBuildParams) -> Self {
Self {
max_connections: params.m as u32,
construction_ef: params.ef_construction as u32,
max_level: params.max_level as u32,
}
}
}

impl Default for HnswBuildParams {
fn default() -> Self {
Self {
Expand Down
9 changes: 9 additions & 0 deletions rust/lance-index/src/vector/pq/builder.rs
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,15 @@ pub struct PQBuildParams {
pub sample_rate: usize,
}

impl From<&PQBuildParams> for crate::pb::vector_index_details::ProductQuantization {
fn from(params: &PQBuildParams) -> Self {
Self {
num_bits: params.num_bits as u32,
num_sub_vectors: params.num_sub_vectors as u32,
}
}
}

impl Default for PQBuildParams {
fn default() -> Self {
Self {
Expand Down
8 changes: 8 additions & 0 deletions rust/lance-index/src/vector/sq/builder.rs
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,14 @@ pub struct SQBuildParams {
pub sample_rate: usize,
}

impl From<&SQBuildParams> for crate::pb::vector_index_details::ScalarQuantization {
fn from(params: &SQBuildParams) -> Self {
Self {
num_bits: params.num_bits as u32,
}
}
}

impl Default for SQBuildParams {
fn default() -> Self {
Self {
Expand Down
2 changes: 1 addition & 1 deletion rust/lance/src/dataset/index.rs
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,9 @@ use async_trait::async_trait;
use lance_core::{Error, Result};
use lance_encoding::version::LanceFileVersion;
use lance_index::frag_reuse::FRAG_REUSE_INDEX_NAME;
use lance_index::pb::VectorIndexDetails;
use lance_index::scalar::lance_format::LanceIndexStore;
use lance_table::format::IndexMetadata;
use lance_table::format::pb::VectorIndexDetails;
use serde::{Deserialize, Serialize};

use super::optimize::{IndexRemapper, IndexRemapperOptions};
Expand Down
1 change: 1 addition & 0 deletions rust/lance/src/dataset/optimize.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4091,6 +4091,7 @@ mod tests {
],
version: crate::index::vector::IndexFileVersion::V3,
skip_transpose: false,
runtime_hints: Default::default(),
},
false,
)
Expand Down
Loading
Loading