Skip to content

[Bug] BTREE index incorrect row count on OR predicate with NULLs #5895

@fenfeng9

Description

@fenfeng9

Description

When using a BTREE scalar index, an OR filter involving NULL values returns a different row count compared to a full scan.
The scalar index path incorrectly handles three-valued logic when combining AllowList and BlockList masks, causing NULL rows to be incorrectly included in the result set.

How to reproduce

import tempfile

import lance
import pyarrow as pa

# One NULL and one non-NULL value
data = pa.table({"c1": pa.array([None, 1], type=pa.int64())})
filter_expr = "(c1 != 0) OR (c1 < 5)"

with tempfile.TemporaryDirectory() as d:
    lance.write_dataset(data, f"{d}/no")
    lance.write_dataset(data, f"{d}/ix")
    ds_no = lance.dataset(f"{d}/no")
    ds_ix = lance.dataset(f"{d}/ix")
    ds_ix.create_scalar_index("c1", index_type="BTREE")

    print("no_index:", ds_no.to_table(filter=filter_expr).num_rows)
    print("with_index:", ds_ix.to_table(filter=filter_expr).num_rows)

Result

no_index: 1
with_index: 2

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingcritical-fixBugs that cause crashes, security vulnerabilities, or incorrect data.

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions