Skip to content

Zone map and bloom filter don't seem to handle deletions correctly #4758

@wjones127

Description

@wjones127

I am getting fewer results than expected from queries when there is a zone map filter and deletions.

Reproduction

import pyarrow as pa
import lance

data = pa.table({
    "id": range(10),
    "value": [True, False] * 5,
})
ds = lance.write_dataset(data, "memory://")
ds.delete("NOT value") # Works if I comment this out

ds.to_table(filter="value")
pyarrow.Table
id: int64
value: bool
----
id: [[0,2,4,6,8]]
value: [[true,true,true,true,true]]

But if I add a zone map index, I get only the first three results:

ds.create_scalar_index("value", "ZONEMAP")
ds.to_table(filter="value")
pyarrow.Table
id: int64
value: bool
----
id: [[0,2,4]]
value: [[true,true,true]]

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingcritical-fixBugs that cause crashes, security vulnerabilities, or incorrect data.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions