Skip to content

fix: ensure recheck for IsNotNull in bloom filter#5192

Merged
wjones127 merged 3 commits intomainfrom
fix-5171
Nov 7, 2025
Merged

fix: ensure recheck for IsNotNull in bloom filter#5192
wjones127 merged 3 commits intomainfrom
fix-5171

Conversation

@Xuanwo
Copy link
Copy Markdown
Collaborator

@Xuanwo Xuanwo commented Nov 7, 2025

This PR will fix #5171.

BloomFilter only provides AtMost for IS NULL. Our planner implements IS NOT NULL as NOT (IS NULL), which causes the BloomFilter result to be treated as AtLeast. But this is incorrect for BloomFilter. Additionally, FilteredReadExec treats AtLeast as exact, which can drop many ranges that might contain NOT NULL.

We fixed this by adding a recheck before applying NOT to BloomFilter.


This PR was primarily authored with Codex using GPT-5-Codex and then hand-reviewed by me. I AM responsible for every change made in this PR. I aimed to keep it aligned with our goals, though I may have missed minor issues. Please flag anything that feels off, I'll fix it quickly.

Signed-off-by: Xuanwo <github@xuanwo.io>
Signed-off-by: Xuanwo <github@xuanwo.io>
@Xuanwo Xuanwo requested a review from wjones127 November 7, 2025 12:51
@github-actions github-actions Bot added the bug Something isn't working label Nov 7, 2025
Copy link
Copy Markdown
Contributor

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this!

@wjones127 wjones127 merged commit 10dbd72 into main Nov 7, 2025
29 of 31 checks passed
@wjones127 wjones127 deleted the fix-5171 branch November 7, 2025 23:18
jackye1995 pushed a commit to jackye1995/lance that referenced this pull request Jan 21, 2026
This PR will fix lance-format#5171.

BloomFilter only provides `AtMost` for `IS NULL`. Our planner implements
`IS NOT NULL` as `NOT (IS NULL)`, which causes the BloomFilter result to
be treated as `AtLeast`. But this is incorrect for BloomFilter.
Additionally, FilteredReadExec treats `AtLeast` as exact, which can drop
many ranges that might contain `NOT NULL`.

We fixed this by adding a recheck before applying `NOT` to BloomFilter.

---

**This PR was primarily authored with Codex using GPT-5-Codex and then
hand-reviewed by me. I AM responsible for every change made in this PR.
I aimed to keep it aligned with our goals, though I may have missed
minor issues. Please flag anything that feels off, I'll fix it
quickly.**

---------

Signed-off-by: Xuanwo <github@xuanwo.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bloom filter does not handle IS NOT NULL correctly

2 participants