Skip to content

Support Binary arrays in starts_with, ends_with and contains #6926

Merged
alamb merged 14 commits intoapache:mainfrom
rluvaton:add-example-arrow-binary-crate
Jan 22, 2025
Merged

Support Binary arrays in starts_with, ends_with and contains #6926
alamb merged 14 commits intoapache:mainfrom
rluvaton:add-example-arrow-binary-crate

Conversation

@rluvaton
Copy link
Copy Markdown
Member

@rluvaton rluvaton commented Dec 31, 2024

(ignore branch name)

Which issue does this PR close?

Closes #6923
Closes #6924

What changes are included in this PR?

  1. Made PredicateImpl trait to work with the predicate regardless of string or binary
  2. move implementation to use the Predicate and make it more generic
  3. implement the PredicateImpl for the old Predicate and the new BinaryPredicate using macro (I don't really like this as it seem less maintainable, but not sure what's better, duplicating or macro, or another approach)

Are there any user-facing changes?

Yes, allow users to pass binary arrays to like/starts with/contains and more

@github-actions github-actions Bot added the arrow Changes to the arrow crate label Dec 31, 2024
@rluvaton rluvaton marked this pull request as ready for review December 31, 2024 19:39
Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry in the delay reviewing this PR -- it is hard to find time reviewing such a large PR

I wonder what the usecase is for using LIKE on binary data? I as because it seems to me that LIKE is mostly useful for character strings.

I can see the usecase for starts_with / ends_with and contains for binary data,

Perhaps instead of trying to inject binary array into the code for handling strings, we could simply have simpler prefix/suffix matching for binary -- it might have some more repetition but would be simpler to understand any avoid any potential performance issues related to this code 🤔


impl FixedSizeBinaryArray {
/// Returns true if all data within this array is ASCII
pub fn is_ascii(&self) -> bool {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the need to check a binary array for ASCII -- there shouldn't be any optimizations that rely on the data being ASCII

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because I did not want to duplicate the whole like and predicate file I implemented like and this needed for the like impl.

But instead I only implemented the contains, start with, and ends with

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Jan 17, 2025

Marking as draft as I think this PR is no longer waiting on feedback. Please mark it as ready for review when it is ready for another look

@alamb alamb marked this pull request as draft January 17, 2025 22:02
@rluvaton rluvaton force-pushed the add-example-arrow-binary-crate branch from 5f9a6c8 to 30b432e Compare January 19, 2025 12:04
@rluvaton rluvaton marked this pull request as ready for review January 19, 2025 16:50
Comment thread arrow-array/src/array/mod.rs
@rluvaton
Copy link
Copy Markdown
Member Author

@alamb this PR is now ready for review and is much smaller. thanks for the feedback

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @rluvaton -- I looked through this and I think it looks quite nice and easy to follow now and I think the risk of regressions in performance for strings is low

Thank you for your patience

@alamb alamb changed the title add binary support in arrow-string Support Binary arrays in starts_with, ends_with and contains Jan 21, 2025
@alamb
Copy link
Copy Markdown
Contributor

alamb commented Jan 21, 2025

I plan to merge this tomorrow unless anyone else would like time to review

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Jan 22, 2025

Thanks again @rluvaton

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

arrow-string function should support binary input as well

2 participants