Summary
lance==2.0.1 can panic while decoding a single primitive timestamp column when metadata forces
structural-encoding=miniblock + compression=zstd.
Panic site:
lance-encoding/src/encodings/logical/primitive.rs:1305
unreachable!: Mini-block dictionary encoding must use Variable, Flat, or General compression
Python surface error:
ArrowInvalid: External error: RuntimeError: Task was aborted
Versions
- Python
lance==2.0.1
- Rust crates in writer side:
lance = 2.0.1, lance-encoding = 2.0.1
import tempfile
from pathlib import Path
import numpy as np, pyarrow as pa, lance
rng=np.random.default_rng(6); runs=np.minimum(rng.geometric(0.45,500_000),200); incs=rng.integers(1,201,runs.size)
vals=np.repeat(np.cumsum(np.r_[1_586_995_200_000,incs[:-1]]),runs)[:10_000]
meta={b'lance-encoding:structural-encoding':b'miniblock',b'lance-encoding:compression':b'zstd',b'lance-encoding:compression-level':b'3'}
table=pa.Table.from_arrays([pa.array(vals,type=pa.timestamp('ms'))],schema=pa.schema([pa.field('timestamp',pa.timestamp('ms'),False,metadata=meta)]))
with tempfile.TemporaryDirectory() as d:
uri=str(Path(d)/'repro.lance'); lance.write_dataset(table,uri,mode='create',max_rows_per_file=1_048_576,max_rows_per_group=131_072,data_storage_version='2.2',enable_stable_row_ids=True,enable_v2_manifest_paths=True)
lance.dataset(uri).to_table(columns=['timestamp'])
Observed
- Repro fails consistently with background panic +
ArrowInvalid.
Expected
- No panic in reader threads.
- Either successful decode or a graceful validation error.
Additional data point
Setting LANCE_ENCODING_DICT_TOO_SMALL=99999999 before write avoids this failure in my environment,
which suggests a dictionary/miniblock interaction in this path.
Summary
lance==2.0.1can panic while decoding a single primitive timestamp column when metadata forcesstructural-encoding=miniblock+compression=zstd.Panic site:
lance-encoding/src/encodings/logical/primitive.rs:1305unreachable!: Mini-block dictionary encoding must use Variable, Flat, or General compressionPython surface error:
ArrowInvalid: External error: RuntimeError: Task was abortedVersions
lance==2.0.1lance = 2.0.1,lance-encoding = 2.0.1Observed
ArrowInvalid.Expected
Additional data point
Setting
LANCE_ENCODING_DICT_TOO_SMALL=99999999before write avoids this failure in my environment,which suggests a dictionary/miniblock interaction in this path.