I get the error:
Io error: Execution error: Query Execution error: Execution error: Query Execution error: Execution error: Not found: path/to/temp_ivfpq_to_batches.lance/_indices/97975599-23c9-4c5b-8eb4-730a4de34cb0/page_lookup.lance](http://localhost:8888/jacketsj/proj/lance-work/notebooks/temp_ivfpq_to_batches.lance/_indices/97975599-23c9-4c5b-8eb4-730a4de34cb0/page_lookup.lance), [/home/runner/work/lance/lance/rust/lance-io/src/local.rs:100:31](http://localhost:8888/home/runner/work/lance/lance/rust/lance-io/src/local.rs#line=99), [/rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/core/src/task/poll.rs:288:44](http://localhost:8888/rustc/eeb90cda1969383f56a2637cbd3037bdf598841c/library/core/src/task/poll.rs#line=287), [/home/runner/work/lance/lance/rust/lance/src/io/exec/take.rs:65:42](http://localhost:8888/home/runner/work/lance/lance/rust/lance/src/io/exec/take.rs#line=64)
(looks like lots of error catching and re-throwing, oh boy)
Simple repro:
import lance
import numpy as np
import pyarrow as pa
import pyarrow.compute as pc
ds_uri = "temp_ivfpq_to_batches.lance"
# Generate data
dims = 4
nrows = 500
def next_batch(batch_size, offset):
values = pc.random(dims * batch_size).cast('float32')
return pa.table({
'id': pa.array([offset + j for j in range(batch_size)]),
'vector': pa.FixedSizeListArray.from_arrays(values, dims),
}).to_batches()[0]
def batch_iter(num_rows):
i = 0
while i < num_rows:
batch_size = min(10_000, num_rows - i)
yield next_batch(batch_size, i)
i += batch_size
schema = next_batch(1, 0).schema
ds = lance.write_dataset(batch_iter(nrows), ds_uri, schema=schema, mode="overwrite")
# No vector index yet, so this will not crash:
next(iter(ds.to_batches(filter="vector is not null")))
# Create index
metric = "L2"
index_type="IVF_PQ"
num_partitions=256
num_sub_vectors=2
column="vector"
ds.create_index(
column=[column],
metric=metric,
index_type=index_type,
num_partitions=num_partitions,
num_sub_vectors=num_sub_vectors,
)
# Crash occurs now that vector index is present
next(iter(ds.to_batches(filter="vector is not null")))
Seems to repro on the release from last week, which is up-to-date with all merged PRs at the time of writing.
This also causes an error if one builds a second time with an accelerator enabled, since we have the above filter type enabled for such cases. I originally ran into this while doing so over S3, which turns out not to be related.
I get the error:
(looks like lots of error catching and re-throwing, oh boy)
Simple repro:
Seems to repro on the release from last week, which is up-to-date with all merged PRs at the time of writing.
This also causes an error if one builds a second time with an accelerator enabled, since we have the above filter type enabled for such cases. I originally ran into this while doing so over S3, which turns out not to be related.