-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-41183: [C++][Python] Expose recursive flatten for lists on list_flatten kernel function and pyarrow bindings #41295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-41183: [C++][Python] Expose recursive flatten for lists on list_flatten kernel function and pyarrow bindings #41295
Conversation
|
|
|
@felipecrv @mapleFU PTAL this if you're interested! |
8f01a3a to
c400b46
Compare
| @@ -57,8 +57,14 @@ Result<TypeHolder> LastType(KernelContext*, const std::vector<TypeHolder>& types | |||
| } | |||
|
|
|||
| Result<TypeHolder> ListValuesType(KernelContext*, const std::vector<TypeHolder>& args) { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should add a bool recursive parameter (without default value) and skip the for loop if it's false to match existing usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic here seems correct for the recursive = false condition except one more value_kind check on is_list(value_kind) || is_list_view(value_kind).
And it's the output resolver(a callback to identify the kernel functions's output type) which called only once in expression bind or CallFunction's, and have little impact on kernel path. So maybe the logic here could be retained?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe. I started another thread and asked for clarification #41295 (comment)
|
@ZhangHuiGui if you're dealing with bugs/failures in tests, it might be because of this |
b3fa84d to
1298642
Compare
felipecrv
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before merge I will do a more detailed review (I don't think I will find issues).
@jorisvandenbossche what you think about the Python changes?
python/pyarrow/array.pxi
Outdated
| @@ -2747,69 +2824,8 @@ cdef class ListViewArray(Array): | |||
| """ | |||
| return pyarrow_wrap_array((<CListViewArray*> self.ap).sizes()) | |||
|
|
|||
| def flatten(self, memory_pool=None): | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pyarrow's mainly changes should be here. [Large]ListViewArray does not seem to need to maintain an independent flatten interface separately, and can directly use the one provided by BaseListArray.
b73fe44 to
d623dce
Compare
felipecrv
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please accept the patch suggestions containing typos and grammar corrections.
After these changes, I will get a Python person to give a quick look at the Python parts, then I will merge.
…function 2. Support [Large]ListView for some kernel functions: list_flatten,list_value_length, list_element 3. Support recursive flatten for pyarrow bindinds and simplify [Large]ListView's pyarrow bindings 4. Refactor vector_nested_test.cc for better support [Large]ListView types.
8d1c0f3 to
ce9947a
Compare
danepitkin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (from python perspective)
paleolimbot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
I took a look through the Python changes, which seem reasonable except for possibly an explicit test to make sure the value of recursive makes it all the way to C++.
I don't have much experience with the class hierarchy of Arrays and so I can't speak to that change (but it seems like a good idea to only maintain one version of that documentation/interface).
| assert arr0.type.equals(typ0) | ||
| assert arr1.flatten().equals(arr0) | ||
| assert arr2.flatten().flatten().equals(arr0) | ||
| assert arr2.flatten(True).equals(arr0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there also be a test here where flatten(True) and flatten(False) return different things?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the recursive parameter defaults to False in the flatten interface, but some of the previous test cases do not include this test. We should add such a test case here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see: assert arr2.flatten(True).equals(arr0) is the case that I had in mind and it was already added (I just missed the distinction between arr1 and arr0). The case you added helps make that distinction clearer...thank you!
|
After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 5e986be. There was 1 benchmark result with an error:
There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 11 possible false positives for unstable benchmarks that are known to sometimes produce them. |
…ist_flatten kernel function and pyarrow bindings (apache#41295) ### Rationale for this change Expose recursive flatten for logical lists on list_flatten kernel function and pyarrow bindings. ### What changes are included in this PR? 1. Expose recursive flatten for logical lists on `list_flatten` kernel function 2. Support [Large]ListView for some kernel functions: `list_flatten`,`list_value_length`, `list_element` 3. Support recursive flatten for pyarrow bindinds and simplify [Large]ListView's pyarrow bindings 4. Refactor vector_nested_test.cc for better support [Large]ListView types. ### Are these changes tested? Yes ### Are there any user-facing changes? Yes. 1. Some kernel functions like: list_flatten, list_value_length, list_element would support [Large]ListView types 2. `list_flatten` and related pyarrow bindings could support flatten recursively with an ListFlattenOptions. * GitHub Issue: apache#41183 Lead-authored-by: ZhangHuiGui <hugo.zhang@openpie.com> Co-authored-by: ZhangHuiGui <2689496754@qq.com> Signed-off-by: Felipe Oliveira Carvalho <felipekde@gmail.com>
Rationale for this change
Expose recursive flatten for logical lists on list_flatten kernel function and pyarrow bindings.
What changes are included in this PR?
list_flattenkernel functionlist_flatten,list_value_length,list_elementAre these changes tested?
Yes
Are there any user-facing changes?
Yes.
list_flattenand related pyarrow bindings could support flatten recursively with an ListFlattenOptions.