-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-14656: [Python] sort helper for Array, ChunkedArray and StructArray #11659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
python/pyarrow/array.pxi
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems a bit strange to have this method only on StructArray IMO. Also for other arrays, sorting takes some effort with the current API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is sorting by a given child field. So it wouldn't make sense on other arrays (including unions).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it does make sense to sort just the array.
What I mean is, also for sorting an array, it takes some code:
arr.take(pc.array_sort_indices(arr, order))
vs for struct array:
arr.take(pc.array_sort_indices(arr.field(name), order)
Once you have the first snippet, I don't see why the second snippet would necessarily warrant a helper function (and the first not)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, we can expose both if we want, though @amol- 's ticket seems to be specifically for sort_by :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know, but so my feedback is that I am personally -1 on adding a special case method sort_by
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be resonable to rename the method to StructArray.sort and add it to all Arrays. The signature would be different between them (as in StructArray you have to provide the field argument) but apart from that we can provide a sort method for all arrays and it would have value as it would still prevent combining multiple compute functions to achieve the wanted result.
|
This is currently blocked by:
|
|
@amol- Why don't you just |
That's what I'm ending up doing for |
Yes, using "dummy" name is also what we do elsewhere: arrow/python/pyarrow/compute.py Lines 722 to 723 in 92ee295
|
python/pyarrow/tests/test_array.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be revert to omit the fieldname, it was the acceeptance test for sorting struct arrays wiithout any specific field.
python/pyarrow/tests/test_array.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be revert to omit the fieldname, it was the acceeptance test for sorting struct arrays wiithout any specific field.
support for Table and RecordBatch to NestedValuesComparator
|
@amol- What is the status on this? Do you want to take it up? |
|
Yes I'll probably not get the time to take this on at least till June, so please feel free to assign it to someone else. Sorry for holding up this PR for so long! |
No description provided.