fix: handle NULL elements in LABEL_LIST index results and explain_plan#5867
Conversation
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
|
PTAL @westonpace . |
Add tests for List<str>, List<int>, Struct, and List<Struct<str>> covering scan, take, and filter (including NOT/OR variants) with and without indices (LabelList, BTree, Bitmap). Data includes null list elements, null lists, null struct fields, and null struct elements in lists to catch regressions like lance-format#5867. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Negation is currently broken for NULL lists in the index path. As a result, we can’t distinguish “list is NULL” from “list doesn’t contain the value”, and NULL is treated as FALSE; NOT then incorrectly returns it. For the test case
But since Row 2 is missing from the index, it's treated as FALSE → NOT becomes TRUE, causing the negation query to incorrectly return it. |
|
|
||
| def test_label_list_index_null_element_match(tmp_path: Path): | ||
| """Ensure LABEL_LIST index keeps scan semantics when lists contain NULLs.""" | ||
| tbl = pa.table({"labels": [["foo", None], ["foo"], None]}) |
There was a problem hiding this comment.
We should add a case where there are nulls but it shouldn't contain "foo".
| tbl = pa.table({"labels": [["foo", None], ["foo"], None]}) | |
| tbl = pa.table({"labels": [["foo", None], ["foo"], ["bar", None], None]}) |
| "NOT array_has_any(labels, ['foo'])", | ||
| "NOT array_has_all(labels, ['foo'])", | ||
| "NOT array_contains(labels, 'foo')", |
There was a problem hiding this comment.
If you want to address these in a different issue / PR, feel free to comment out the failing one and add a comment with a link to a follow up issue.
There was a problem hiding this comment.
Thanks for the suggestion! I'll create a follow-up issue and submit a PR to address these separately.
Co-authored-by: Will Jones <willjones127@gmail.com>
1c10037 to
bc7897b
Compare
|
There are two distinct issues.
This PR fixes (1) by clearing element-level NULLs in the LABEL_LIST index path, and splits the unit tests to cover element-level vs list-level NULLs. Examples:
|
- Clear element-level nulls in label_list searches - Update null-handling tests for label_list
Add tests for List<str>, List<int>, Struct, and List<Struct<str>> covering scan, take, and filter (including NOT/OR variants) with and without indices (LabelList, BTree, Bitmap). Data includes null list elements, null lists, null struct fields, and null struct elements in lists to catch regressions like lance-format#5867. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
LabelList index still has issues with null element handling despite PR lance-format#5867 and PR lance-format#5914. Tests pass without LabelList index. Re-enable when fully fixed. Issue: lance-format#5682 Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
closes #5682
changes:
LabelListQuery::to_exprto preventexplain_plan()panics.