-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-11725: [Rust][DataFusion] Make use of the new divide_scalar kernel in arrow #9543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| macro_rules! binary_string_array_op_scalar { | ||
| ($LEFT:expr, $RIGHT:expr, $OP:ident) => {{ | ||
| let result = match $LEFT.data_type() { | ||
| let result: Result<Arc<dyn Array>> = match $LEFT.data_type() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could not find a way to do this type erasure directly in BinaryExpr::evaluate (L419, which only had scalar operations on BooleanArray before), so I'm doing it here and also for binary_array_op_scalar (L244).
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @abreis -- looks really nice to me. @jorgecarleitao or @Dandandan any thoughts?
Dandandan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks againt @abreis
|
I plan to merge this once the CI goes green |
|
CI failure seems unrelated. Note that the first commit already passed CI, and this second commit only changes a few error strings, so it should be safe to merge. |
|
The integration failure looks like https://issues.apache.org/jira/browse/ARROW-11717 |
|
Merged. 🎉 Thanks @abreis |
This is a small PR to make DataFusion use the just-merged
divide_scalararrow kernel (#9454).Performance-wise:
arrowside, this specialized kernel is ~40-50% faster than the standarddivide, mostly due to not having to check for divide-by-zero on every row;datafusionside, it can now skip thescalar.to_array_of_size(num_rows)allocation, which should be a decent win for operations on large arrays.The eventual goal is to have
op_scalarvariants for every arithmetic operation —dividewill show the biggest performance gains but all variants should save DataFusion a (possibly expensive) allocation.