-
Notifications
You must be signed in to change notification settings - Fork 1.9k
perf: Fix NLJ slow join with condition array_has
#18161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I verified that with this fix the reproducer from @ianthetechie from this issue is fixed:
However, am not sure this code has test coverage. I checked via
nice cargo llvm-cov --html test --test sqllogictestsnice cargo llvm-cov --html test -p datafusion -p datafusion-physical-planI will look into adding some coverage
Now the performance are similar, I suspect the most time is spend evaluating the expensive array_has so the optimization in #16996 can't help much.
Yes, I looked at the array_has implementation and it is doing a lot of work. I will file a follow on ticket
Also, it seems to me that the fix / improvement for ScalarValue::to_array_of_size() is more general than just NLJ, so I will also file a ticket about that as well
| scalar_value.to_array_of_size(filtered_probe_batch.num_rows())? | ||
| // Avoid using `ScalarValue::to_array_of_size()` for `List(Utf8View)` to avoid | ||
| // deep copies for buffers inside `Utf8View` array. See below for details. | ||
| // https://github.com/apache/datafusion/issues/18159 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The root cause is tracked in
| DataType::List(field) | DataType::LargeList(field) | ||
| if field.data_type() == &DataType::Utf8View => | ||
| { | ||
| let indices_iter = std::iter::repeat_n( | ||
| build_side_index as u64, | ||
| filtered_probe_batch.num_rows(), | ||
| ); | ||
| let indices_array = UInt64Array::from_iter_values(indices_iter); | ||
| take(original_left_array.as_ref(), &indices_array, None)? | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this approach could be ported into ScalarValue::to_array_of_size itself rather than special cased here -- which would improve performance in potentially other places
datafusion/datafusion/common/src/scalar/mod.rs
Lines 3238 to 3245 in 556eb9b
| fn list_to_array_of_size(arr: &dyn Array, size: usize) -> Result<ArrayRef> { | |
| let arrays = repeat_n(arr, size).collect::<Vec<_>>(); | |
| let ret = match !arrays.is_empty() { | |
| true => arrow::compute::concat(arrays.as_slice())?, | |
| false => arr.slice(0, 0), | |
| }; | |
| Ok(ret) | |
| } |
That being said, I think this is a nice point fix that we can safely backport to the datafusion 50 branch, so I think we should merge this PR / backport it and I will file a follow on PR to further improve the code
|
THANK YOU very much for this fix and diagnosis @2010YOUY01 |
Thank you for the review. The feedback makes sense to me, but I can only address it tomorrow. If you're waiting on this patch to be included in the release, feel free to push changes directly to the PR. |
|
Thank you @2010YOUY01 I just pushed a test that adds coverage for this case I verified it is covered using |
| @@ -0,0 +1,63 @@ | |||
| # Licensed to the Apache Software Foundation (ASF) under one | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new test added here
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @2010YOUY01
|
Filed a ticket to track making |

Which issue does this PR close?
Rationale for this change
See the above issue and its comment #18070 (comment)
What changes are included in this PR?
In nested loop join, when the join column includes
List(Utf8View), usetake()instead ofto_array_of_size()to avoid deep copying the utf8 buffers insideUtf8Viewarray.This is the quick fix, avoiding deep copy inside
to_array_of_size()is a bit tricky.Here is
ListArray's physical layout: https://arrow.apache.org/rust/arrow/array/struct.GenericListArray.htmlIf multiple elements is pointing to the same list range, the underlying payload can't be reused.So the potential fix in
to_array_of_sizecan only avoids copying the inner-inner utf8view array buffers, but can't avoid copying the inner array (i.e. views are still copied), and deep copying for other primitive types also can't be avoided. Seems this can be better solved whenListViewtype is ready 🤔Benchmark
I tried query 1 in #18070, but only used 3 randomly sampled
placesparquet file.49.0.0: 4s
50.0.0: stuck > 1 minute
PR: 4s
Now the performance are similar, I suspect the most time is spend evaluating the expensive
array_hasso the optimization in #16996 can't help much.Are these changes tested?
Existing tests
Are there any user-facing changes?
No