Skip to content

Conversation

@amorynan
Copy link

For solving : #3859
Add methods to push struct-level validity into child fields:

  • push_validity_into_children(preserve_struct_validity: bool)
  • push_validity_into_children_default() - convenience method with preserve=false

The functionality propagates null information from struct level down to individual fields, with options to preserve or remove the struct-level validity.

Includes comprehensive tests covering all scenarios:

  • preserve_struct_validity = true
  • preserve_struct_validity = false (default)
  • no nulls edge case

Add methods to push struct-level validity into child fields:
- push_validity_into_children(preserve_struct_validity: bool)
- push_validity_into_children_default() - convenience method with preserve=false

The functionality propagates null information from struct level down to
individual fields, with options to preserve or remove the struct-level validity.

Includes comprehensive tests covering all scenarios:
- preserve_struct_validity = true
- preserve_struct_validity = false (default)
- no nulls edge case

Signed-off-by: amorynan <amorywang111@gmail.com>
@connortsui20 connortsui20 added the feature Release label indicating a new feature or request label Dec 27, 2025
@codecov
Copy link

codecov bot commented Dec 27, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 81.96%. Comparing base (1e0f608) to head (cefd1e1).
⚠️ Report is 14 commits behind head on develop.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@connortsui20 connortsui20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good first step, though I'm trying to figure out how we want to implement this behavior for the new operator world we are migrating to.

@gatesn mentioned this in the original issue:

Since this issue was filed, we now have a "Mask" expression that essentially performs intersection with validity.

So you could take a look at how you might take the validity from the struct array itself, and wrap up each child field in a mask expression (see builtins.rs), before constructing a new StructArray with or without the previous validity. These two are different behaviors.

This PR implements this feature correctly for the old (current) world, but we will likely want to implement the behavior described above very very soon, so soon that it might not even be worth merging this right now? @gatesn do you have any thoughts?



pub fn push_validity_into_children(&self, preserve_struct_validity: bool) -> VortexResult<Self> {
use crate::compute::mask;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this import to the top?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also prefer to see compute::mask as the function call instead of just mask, but that is just my personal preference

Comment on lines +480 to +502
/// // Create struct with top-level nulls
/// let struct_array = StructArray::try_new(
/// ["a", "b"].into(),
/// vec![
/// buffer![1i32, 2i32, 3i32].into_array(),
/// buffer![10i32, 20i32, 30i32].into_array(),
/// ],
/// 3,
/// Validity::from_iter([true, false, true]), // row 1 is null
/// ).unwrap();
///
/// // Push validity into children, preserving struct validity
/// let pushed = struct_array.push_validity_into_children(true).unwrap();
/// // pushed.fields()[0] now has nulls at position 1
/// // pushed.fields()[1] now has nulls at position 1
/// // pushed.validity still shows row 1 as null
///
/// // Push validity into children, removing struct validity
/// let pushed_no_struct = struct_array.push_validity_into_children(false).unwrap();
/// // pushed_no_struct.fields()[0] now has nulls at position 1
/// // pushed_no_struct.fields()[1] now has nulls at position 1
/// // pushed_no_struct.validity is AllValid
/// ```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be more helpful to comment what the struct array looks like instead of requiring the reader to parse what the created struct array becomes after try_new. And you might as well have assertions stating that specific positions are null rather than just comments.

Comment on lines +484 to +485
/// buffer![1i32, 2i32, 3i32].into_array(),
/// buffer![10i32, 20i32, 30i32].into_array(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may not be immediately obvious to someone reading this that the child buffers are non-nullable

self.names().clone(),
self.fields().clone(),
self.len(),
Validity::AllValid,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm I wonder if we should make the validity of the top-level struct array Validity::NonNullable instead of AllValid if the user requests to not preserve the struct validity. Do you have any thoughts?

Comment on lines +529 to +540
let null_mask = struct_validity_mask.iter_bools(|iter| {
Mask::from_iter(iter.map(|valid| !valid)) // invert: valid->invalid, invalid->valid
});

let masked_fields: Vec<ArrayRef> = self
.fields()
.iter()
.map(|field| {
// Use the mask function to apply null positions to each field.
mask(field.as_ref(), &null_mask)
})
.collect::<VortexResult<Vec<_>>>()?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This inverted logic makes me think more that we should just move this to the new world now instead of merging this and then getting rid of it immediately. @gatesn Any thoughts about this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature Release label indicating a new feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants