Skip to content

Conversation

@abreis
Copy link
Contributor

@abreis abreis commented Feb 21, 2021

This is a small PR to make DataFusion use the just-merged divide_scalar arrow kernel (#9454).

Performance-wise:

  • on the arrow side, this specialized kernel is ~40-50% faster than the standard divide, mostly due to not having to check for divide-by-zero on every row;
  • on the datafusion side, it can now skip the scalar.to_array_of_size(num_rows) allocation, which should be a decent win for operations on large arrays.

The eventual goal is to have op_scalar variants for every arithmetic operation — divide will show the biggest performance gains but all variants should save DataFusion a (possibly expensive) allocation.

@github-actions
Copy link

macro_rules! binary_string_array_op_scalar {
($LEFT:expr, $RIGHT:expr, $OP:ident) => {{
let result = match $LEFT.data_type() {
let result: Result<Arc<dyn Array>> = match $LEFT.data_type() {
Copy link
Contributor Author

@abreis abreis Feb 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not find a way to do this type erasure directly in BinaryExpr::evaluate (L419, which only had scalar operations on BooleanArray before), so I'm doing it here and also for binary_array_op_scalar (L244).

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @abreis -- looks really nice to me. @jorgecarleitao or @Dandandan any thoughts?

Copy link
Contributor

@Dandandan Dandandan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks againt @abreis

@alamb
Copy link
Contributor

alamb commented Feb 23, 2021

I plan to merge this once the CI goes green

@abreis
Copy link
Contributor Author

abreis commented Feb 23, 2021

CI failure seems unrelated. Note that the first commit already passed CI, and this second commit only changes a few error strings, so it should be safe to merge.

@alamb
Copy link
Contributor

alamb commented Feb 23, 2021

The integration failure looks like https://issues.apache.org/jira/browse/ARROW-11717

@alamb alamb closed this in 6a5ed0a Feb 23, 2021
@alamb
Copy link
Contributor

alamb commented Feb 23, 2021

Merged. 🎉 Thanks @abreis

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants