Support Binary arrays in starts_with, ends_with and contains #6926
Support Binary arrays in starts_with, ends_with and contains #6926alamb merged 14 commits intoapache:mainfrom
Binary arrays in starts_with, ends_with and contains #6926Conversation
alamb
left a comment
There was a problem hiding this comment.
I am sorry in the delay reviewing this PR -- it is hard to find time reviewing such a large PR
I wonder what the usecase is for using LIKE on binary data? I as because it seems to me that LIKE is mostly useful for character strings.
I can see the usecase for starts_with / ends_with and contains for binary data,
Perhaps instead of trying to inject binary array into the code for handling strings, we could simply have simpler prefix/suffix matching for binary -- it might have some more repetition but would be simpler to understand any avoid any potential performance issues related to this code 🤔
|
|
||
| impl FixedSizeBinaryArray { | ||
| /// Returns true if all data within this array is ASCII | ||
| pub fn is_ascii(&self) -> bool { |
There was a problem hiding this comment.
I don't understand the need to check a binary array for ASCII -- there shouldn't be any optimizations that rely on the data being ASCII
There was a problem hiding this comment.
because I did not want to duplicate the whole like and predicate file I implemented like and this needed for the like impl.
But instead I only implemented the contains, start with, and ends with
|
Marking as draft as I think this PR is no longer waiting on feedback. Please mark it as ready for review when it is ready for another look |
5f9a6c8 to
30b432e
Compare
|
@alamb this PR is now ready for review and is much smaller. thanks for the feedback |
Binary arrays in starts_with, ends_with and contains
|
I plan to merge this tomorrow unless anyone else would like time to review |
|
Thanks again @rluvaton |
(ignore branch name)
Which issue does this PR close?
Closes #6923
Closes #6924
What changes are included in this PR?
PredicateImpltrait to work with the predicate regardless of string or binaryPredicateImplfor the oldPredicateand the newBinaryPredicateusing macro (I don't really like this as it seem less maintainable, but not sure what's better, duplicating or macro, or another approach)Are there any user-facing changes?
Yes, allow users to pass binary arrays to like/starts with/contains and more