feat(datafusion-functions-aggregate): add support for lists and other nested types in min and max#15857
feat(datafusion-functions-aggregate): add support for lists and other nested types in min and max#15857gabotechs wants to merge 15 commits intoapache:mainfrom
Conversation
min and max
min and max| } else if let DataType::List(field) = col_type { | ||
| extract_window_frame_target_type(field.data_type()) |
There was a problem hiding this comment.
Not 100% sure what am I doing here. This seems necessary for window functions over min/max aggregations on lists to work, and does not seem to be breaking other things, but maybe more experienced DF folks can throw some light here.
There was a problem hiding this comment.
It seems to be related to converting window frames when they have RANGE rather than ROWS -- so maybe we need some more tests. I'll comment below
|
Is this related to |
alamb
left a comment
There was a problem hiding this comment.
Thank you @gabotechs -- I think just a few more tests are needed and this one will eb good to go.
FYI @rluvaton
| } else if let DataType::List(field) = col_type { | ||
| extract_window_frame_target_type(field.data_type()) |
There was a problem hiding this comment.
It seems to be related to converting window frames when they have RANGE rather than ROWS -- so maybe we need some more tests. I'll comment below
| /// Accumulator for min/max of lists | ||
| /// this accumulator is using arrow-row as the internal representation and a way for comparing |
There was a problem hiding this comment.
| /// Accumulator for min/max of lists | |
| /// this accumulator is using arrow-row as the internal representation and a way for comparing | |
| /// Accumulator for min/max of lists | |
| /// | |
| /// This accumulator uses arrow-row as the internal representation and a way | |
| /// for comparing rows. |
| fn value(&self) -> Option<&OwnedRow>; | ||
| } | ||
|
|
||
| #[derive(Debug, Default)] |
There was a problem hiding this comment.
| #[derive(Debug, Default)] | |
| /// Similarly to [`GenericMinMaxAccumulator`], implements sliding min/max for Lists | |
| /// using the Row converter | |
| #[derive(Debug, Default)] |
I think the overlap is that the approach in this PR (to use Row format) could have also been used in #15667. As a follow on / if someone cares about performance, we could probably implement a much more efficient (though complex0 list comparison routine that compares element by element rather than copying the entire list element into a Row format |
|
Here's the new PR: |
Which issue does this PR close?
Rationale for this change
Resume the work started by @rluvaton in #13991, adding some tests and adding a small fix in the middle.
What changes are included in this PR?
Add support for performing
MINandMAXaggregations over lists, building on top of #13991 maintaining the original commit history, and only adding commits on top of it.This PR also includes a commit for fixing ScalarValue::List comparisons that could be shipped independently:
Are these changes tested?
Yes, by new sqllogictests
Are there any user-facing changes?
Yes, users can now perform
MINandMAXoperations over lists