Skip to content

Conversation

@amol-
Copy link
Member

@amol- amol- commented Nov 10, 2021

No description provided.

@github-actions
Copy link

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a bit strange to have this method only on StructArray IMO. Also for other arrays, sorting takes some effort with the current API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is sorting by a given child field. So it wouldn't make sense on other arrays (including unions).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it does make sense to sort just the array.

What I mean is, also for sorting an array, it takes some code:

arr.take(pc.array_sort_indices(arr, order))

vs for struct array:

arr.take(pc.array_sort_indices(arr.field(name), order)

Once you have the first snippet, I don't see why the second snippet would necessarily warrant a helper function (and the first not)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we can expose both if we want, though @amol- 's ticket seems to be specifically for sort_by :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, but so my feedback is that I am personally -1 on adding a special case method sort_by

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be resonable to rename the method to StructArray.sort and add it to all Arrays. The signature would be different between them (as in StructArray you have to provide the field argument) but apart from that we can provide a sort method for all arrays and it would have value as it would still prevent combining multiple compute functions to achieve the wanted result.

@amol- amol- marked this pull request as draft November 12, 2021 16:18
@amol- amol- changed the title ARROW-14656: [Python] sort_by helper for StructArray ARROW-14656: [Python] sort helper for Array, ChunkedArray and StructArray Nov 22, 2021
@amol-
Copy link
Member Author

amol- commented Nov 22, 2021

This is currently blocked by:

@pitrou
Copy link
Member

pitrou commented Nov 22, 2021

@amol- Why don't you just sort_indices instead?

@amol-
Copy link
Member Author

amol- commented Nov 22, 2021

@amol- Why don't you just sort_indices instead?

That's what I'm ending up doing for ChunkedArray, I tried passing None as the column name but it explicitly requires a string, so I'll just end up passing an empty one

@jorisvandenbossche
Copy link
Member

That's what I'm ending up doing for ChunkedArray, I tried passing None as the column name but it explicitly requires a string, so I'll just end up passing an empty one

Yes, using "dummy" name is also what we do elsewhere:

if isinstance(values, (pa.Array, pa.ChunkedArray)):
sort_keys.append(("dummy", "descending"))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be revert to omit the fieldname, it was the acceeptance test for sorting struct arrays wiithout any specific field.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be revert to omit the fieldname, it was the acceeptance test for sorting struct arrays wiithout any specific field.

@pitrou
Copy link
Member

pitrou commented May 4, 2022

@amol- What is the status on this? Do you want to take it up?

@amol-
Copy link
Member Author

amol- commented May 4, 2022

@amol- What is the status on this? Do you want to take it up?

I think I can take it up if @Jedi18 can't work on it anymore. Or even better would @AlvinJ15 be able to tackle it? He can probably do it faster than me.

@Jedi18
Copy link
Contributor

Jedi18 commented May 4, 2022

Yes I'll probably not get the time to take this on at least till June, so please feel free to assign it to someone else. Sorry for holding up this PR for so long!

@jorisvandenbossche
Copy link
Member

@amol- I think this PR can be closed?

The python API additions have been merged in #14781, and the C++ additions are also included in the open PR #14369

@amol-
Copy link
Member Author

amol- commented Mar 22, 2023

@amol- I think this PR can be closed?

The python API additions have been merged in #14781, and the C++ additions are also included in the open PR #14369

I think so, I think it was still open only to take the pieces and move them into the other PRs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants