Description
When using a BTREE scalar index, an OR filter involving NULL values returns a different row count compared to a full scan.
The scalar index path incorrectly handles three-valued logic when combining AllowList and BlockList masks, causing NULL rows to be incorrectly included in the result set.
How to reproduce
import tempfile
import lance
import pyarrow as pa
# One NULL and one non-NULL value
data = pa.table({"c1": pa.array([None, 1], type=pa.int64())})
filter_expr = "(c1 != 0) OR (c1 < 5)"
with tempfile.TemporaryDirectory() as d:
lance.write_dataset(data, f"{d}/no")
lance.write_dataset(data, f"{d}/ix")
ds_no = lance.dataset(f"{d}/no")
ds_ix = lance.dataset(f"{d}/ix")
ds_ix.create_scalar_index("c1", index_type="BTREE")
print("no_index:", ds_no.to_table(filter=filter_expr).num_rows)
print("with_index:", ds_ix.to_table(filter=filter_expr).num_rows)
Result
no_index: 1
with_index: 2
Description
When using a
BTREEscalar index, anORfilter involvingNULLvalues returns a different row count compared to a full scan.The scalar index path incorrectly handles three-valued logic when combining
AllowListandBlockListmasks, causingNULLrows to be incorrectly included in the result set.How to reproduce
Result