Skip to content

feat: Improve InListExpr types, flatten dict haystacks and validate in try_new_from_array#21402

Open
buraksenn wants to merge 1 commit intoapache:mainfrom
buraksenn:inlist-type-validation
Open

feat: Improve InListExpr types, flatten dict haystacks and validate in try_new_from_array#21402
buraksenn wants to merge 1 commit intoapache:mainfrom
buraksenn:inlist-type-validation

Conversation

@buraksenn
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

To make needle/haystack more clear, add comments and with flattened we can use optimizations.

What changes are included in this PR?

  • flatten dictionary haystacks
  • add detailed comments for needle/haystack and dictionary handling
  • type validation to try_new_from_array

Are these changes tested?

Yes. I've added new tests and existing pass test.

Are there any user-facing changes?

Yes. try_new_from_array now expects data types to be logically equal

@github-actions github-actions Bot added physical-expr Changes to the physical-expr crates physical-plan Changes to the physical-plan crate labels Apr 6, 2026
Copy link
Copy Markdown
Contributor

@kosiew kosiew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@buraksenn

Looks good overall. No blocking issues from my side. I left a few small suggestions that could help with consistency and test coverage.

/// cheaper to build.
/// Returns an error if the expression's data type and the array's data type
/// are not logically equal. Null arrays are always accepted.
pub fn try_new_from_array(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that try_new_from_array now does its own logical type validation, but try_new still has a very similar check a few lines below.
Would it make sense to extract this into a small helper so both constructors share the same validation path and error message? That might help avoid subtle drift in the future, especially around dictionary or logical equality handling.

}

#[test]
fn test_try_new_from_array_dict_haystack_int32() -> Result<()> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice coverage here for primitive, string, and float dictionary haystacks. One gap I still see is the multi-column or StructArray path, which is what shared_bounds.rs builds for composite hash-join filters.
It could be worth adding a focused try_new_from_array test with a struct haystack to cover the exact dynamic filter path this PR is enabling.

)) as Arc<dyn PhysicalExpr>
};

// Use in_list_from_array() helper to create InList with static_filter optimization (hash-based lookup)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small cohesion nit: this comment still mentions an in_list_from_array() helper, but the code now calls InListExpr::try_new_from_array(...) directly.

Updating the wording so it matches the current API would make it easier to follow when tracing the dynamic filter path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-expr Changes to the physical-expr crates physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve InListExpr types and expectations

2 participants