Skip to content

fix: handle NULL elements in LABEL_LIST index results and explain_plan#5867

Merged
wjones127 merged 5 commits intolance-format:mainfrom
fenfeng9:fix/label-list-null-handling
Feb 10, 2026
Merged

fix: handle NULL elements in LABEL_LIST index results and explain_plan#5867
wjones127 merged 5 commits intolance-format:mainfrom
fenfeng9:fix/label-list-null-handling

Conversation

@fenfeng9
Copy link
Copy Markdown
Contributor

@fenfeng9 fenfeng9 commented Jan 31, 2026

closes #5682

changes:

  • Treat element-level NULLs in LABEL_LIST as non-matches so array_has_any/array_has_all return TRUE/FALSE when the list itself is non-NULL.
  • Allow nullable list literals in LabelListQuery::to_expr to prevent explain_plan() panics.
  • Add Python tests covering element-level NULLs, list-level NULLs, NULL-literal filters and explain behavior.

@github-actions github-actions Bot added bug Something isn't working python labels Jan 31, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Jan 31, 2026

Codecov Report

❌ Patch coverage is 75.00000% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance-index/src/scalar.rs 50.00% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@fenfeng9
Copy link
Copy Markdown
Contributor Author

fenfeng9 commented Feb 2, 2026

PTAL @westonpace .

Comment thread python/python/tests/test_scalar_index.py
wjones127 added a commit to wjones127/lance that referenced this pull request Feb 5, 2026
Add tests for List<str>, List<int>, Struct, and List<Struct<str>>
covering scan, take, and filter (including NOT/OR variants) with and
without indices (LabelList, BTree, Bitmap).

Data includes null list elements, null lists, null struct fields, and
null struct elements in lists to catch regressions like lance-format#5867.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@fenfeng9
Copy link
Copy Markdown
Contributor Author

fenfeng9 commented Feb 6, 2026

Negation is currently broken for NULL lists in the index path.
LabelListIndex flattens List<T> into scalar rows for BitmapIndex. unnest_batch drops rows where the list itself is NULL (0 indices), so those rows never enter the bitmap.

As a result, we can’t distinguish “list is NULL” from “list doesn’t contain the value”, and NULL is treated as FALSE; NOT then incorrectly returns it.

For the test case [["foo", None], ["foo"], None]:

  • Row 0: TRUE (contains "foo") → NOT → FALSE ✓
  • Row 1: TRUE (contains "foo") → NOT → FALSE ✓
  • Row 2: NULL (list itself is NULL) → NOT → should be NULL (filtered out)

But since Row 2 is missing from the index, it's treated as FALSE → NOT becomes TRUE, causing the negation query to incorrectly return it.


def test_label_list_index_null_element_match(tmp_path: Path):
"""Ensure LABEL_LIST index keeps scan semantics when lists contain NULLs."""
tbl = pa.table({"labels": [["foo", None], ["foo"], None]})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a case where there are nulls but it shouldn't contain "foo".

Suggested change
tbl = pa.table({"labels": [["foo", None], ["foo"], None]})
tbl = pa.table({"labels": [["foo", None], ["foo"], ["bar", None], None]})

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Comment on lines +2066 to +2068
"NOT array_has_any(labels, ['foo'])",
"NOT array_has_all(labels, ['foo'])",
"NOT array_contains(labels, 'foo')",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to address these in a different issue / PR, feel free to comment out the failing one and add a comment with a link to a follow up issue.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! I'll create a follow-up issue and submit a PR to address these separately.

@fenfeng9 fenfeng9 force-pushed the fix/label-list-null-handling branch from 1c10037 to bc7897b Compare February 7, 2026 07:23
@fenfeng9
Copy link
Copy Markdown
Contributor Author

fenfeng9 commented Feb 7, 2026

There are two distinct issues.

  1. Element-level NULLs (NULL items inside a non-NULL list): array_has_any / array_has_all should ignore NULL elements, so results are strictly TRUE/FALSE (no NULL propagation).

  2. List-level NULLs (the list itself is NULL): this still affects NOT semantics and is tracked separately in LabelListIndex: NOT filters mis-handle NULL lists (list-level NULLs) #5904.

This PR fixes (1) by clearing element-level NULLs in the LABEL_LIST index path, and splits the unit tests to cover element-level vs list-level NULLs.

Examples:

Expression Result Reason
array_has_any(["foo", NULL], ["foo"]) TRUE Found "foo"; NULL elements are ignored
NOT array_has_any(["bar", NULL], ["foo"]) TRUE No match; NULL ignored → FALSE → NOT FALSE = TRUE

  - Clear element-level nulls in label_list searches
  - Update null-handling tests for label_list
Copy link
Copy Markdown
Contributor

@wjones127 wjones127 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

@wjones127 wjones127 merged commit 827a59a into lance-format:main Feb 10, 2026
30 checks passed
wjones127 added a commit to wjones127/lance that referenced this pull request Feb 10, 2026
Add tests for List<str>, List<int>, Struct, and List<Struct<str>>
covering scan, take, and filter (including NOT/OR variants) with and
without indices (LabelList, BTree, Bitmap).

Data includes null list elements, null lists, null struct fields, and
null struct elements in lists to catch regressions like lance-format#5867.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
wjones127 added a commit to wjones127/lance that referenced this pull request Feb 10, 2026
LabelList index still has issues with null element handling despite PR lance-format#5867 and PR lance-format#5914.
Tests pass without LabelList index. Re-enable when fully fixed.

Issue: lance-format#5682

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LABEL_LIST index returns incorrect results when list has NULL elements

2 participants