Skip to content

fix: handle DataType::Null in adjust_child_validity to prevent panic#6160

Merged
wjones127 merged 2 commits intolance-format:mainfrom
wjones127:will/ent-990-panic-in-adjust_child_validity-when-merging-null-typed
Mar 11, 2026
Merged

fix: handle DataType::Null in adjust_child_validity to prevent panic#6160
wjones127 merged 2 commits intolance-format:mainfrom
wjones127:will/ent-990-panic-in-adjust_child_validity-when-merging-null-typed

Conversation

@wjones127
Copy link
Copy Markdown
Contributor

Previously, adjust_child_validity would call ArrayData::try_new with a null bitmap on a DataType::Null array, causing an .unwrap() panic with InvalidArgumentError("Arrays of type Null cannot contain a null bitmask").

The trigger: when a user inserts rows where a struct sub-field has only null values, Arrow infers DataType::Null for that column. If a subsequent fragment omits that nullable sub-field, Lance inserts a NullReader to fill it in. MergeStream then merges the real batch (with null struct rows) and the NullReader batch (all-null struct), recursing into the struct where adjust_child_validity is called with the Null-typed child and a non-empty parent validity — triggering the panic.

Fix: skip the bitmask operation when child.data_type() == DataType::Null. A Null array is always entirely null by definition and needs no validity adjustment.

Closes #6159

`adjust_child_validity` would call `ArrayData::try_new` with a null
bitmap on a `DataType::Null` array. Arrow rejects this with
`InvalidArgumentError("Arrays of type Null cannot contain a null
bitmask")`, causing an `.unwrap()` panic at lance-arrow/src/lib.rs:1187.

The panic occurs when a struct column has null rows and one of its
sub-fields has `DataType::Null` — which Arrow infers when a column
contains only null values (e.g. a Python/pandas all-None column). When a
later fragment omits that nullable sub-field, Lance inserts a NullReader
to fill it in. MergeStream then merges the real batch (with null struct
rows) and the NullReader batch (all-null struct), recursing into the
struct where `adjust_child_validity` is called with the Null-typed child
and a non-empty parent validity — triggering the panic.

Fix: skip the bitmap operation when `child.data_type() == DataType::Null`.
A Null array is always entirely null by definition and needs no
validity adjustment.

Fixes lance-format#6159

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions github-actions Bot added the bug Something isn't working label Mar 10, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Review

Clean, well-documented bugfix. The root cause analysis is thorough and the fix is minimal and correct.

No blocking issues found. The early return for DataType::Null is the right approach — these arrays are all-null by definition and Arrow explicitly rejects null bitmaps on them. The fix is in the single chokepoint (adjust_child_validity) that all callers go through.

Both the unit test (direct merge call) and the integration test (end-to-end dataset scan across fragments) cover the scenario well.

LGTM.

@wjones127 wjones127 marked this pull request as ready for review March 10, 2026 20:38
Copy link
Copy Markdown
Contributor

@esteban esteban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 lgtm.

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 10, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@wjones127 wjones127 merged commit 3ccd602 into lance-format:main Mar 11, 2026
30 checks passed
westonpace pushed a commit that referenced this pull request Mar 17, 2026
…6160)

Previously, `adjust_child_validity` would call `ArrayData::try_new` with
a null bitmap on a `DataType::Null` array, causing an `.unwrap()` panic
with `InvalidArgumentError("Arrays of type Null cannot contain a null
bitmask")`.

The trigger: when a user inserts rows where a struct sub-field has only
null values, Arrow infers `DataType::Null` for that column. If a
subsequent fragment omits that nullable sub-field, Lance inserts a
`NullReader` to fill it in. `MergeStream` then merges the real batch
(with null struct rows) and the `NullReader` batch (all-null struct),
recursing into the struct where `adjust_child_validity` is called with
the `Null`-typed child and a non-empty parent validity — triggering the
panic.

Fix: skip the bitmask operation when `child.data_type() ==
DataType::Null`. A `Null` array is always entirely null by definition
and needs no validity adjustment.

Closes #6159

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
westonpace added a commit that referenced this pull request Mar 18, 2026
## Summary

Cherry-picks bug fixes onto `release/v3.0` for the v3.0.1 patch release:

- **#6160** - fix: handle `DataType::Null` in `adjust_child_validity` to
prevent panic
- **#6187** - fix: handle nullable validity layers without def levels
- **#6143** - fix: prevent duplicate manifest entries from concurrent
table creation
- **#6212** - chore: bump lz4_flex patch versions
- **#6146** - fix: replace fetch_arrow_table with to_arrow_table

## Test plan

- CI passes on cherry-picked commits (both PRs were already merged and
tested on main)

---------

Co-authored-by: Will Jones <willjones127@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Xuanwo <github@xuanwo.io>
Co-authored-by: Jonathan Hsieh <jon@lancedb.com>
Co-authored-by: BubbleCal <bubble-cal@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Panic in adjust_child_validity when merging Null-typed arrays: "Arrays of type Null cannot contain a null bitmask"

3 participants