-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-39812: [Python] Add bindings for ListView and LargeListView #39813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-39812: [Python] Add bindings for ListView and LargeListView #39813
Conversation
|
|
|
jorisvandenbossche
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
python/pyarrow/array.pxi
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here the additional disclaimer could be added that this also can return more values / values out of order if certain parts of the values array are not pointed to by the offsets.
In general, given the layout of the "views" list type, there is no guarantee about the content of the values here.
The ListArray.values version has a pointer to flatten(), which would be useful here as well, to explain the difference. But I assume Flatten is not yet implemented for ListView?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flatten is not yet implemented, I figured the generic List / Array APIs can be tested and fixed in follow up PRs. I haven't marked that as a TODO anywhere.. let me know if you think it's worth adding as a comment!
I will update the comment, great idea.
python/pyarrow/array.pxi
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be a flat array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're right. I confused this with the logical value representation..
python/pyarrow/array.pxi
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I hadn't expected this to work. So those get filled by ListViewArray::FromArrays? In which case this is not zero-copy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if this is passed together with a mask? EDIT: I see that is checked and raises an error in the C++ code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, yeah good point, maybe I should leave this example out to deter users from accidentally creating copies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will update this to use a null mask instead of None
python/pyarrow/array.pxi
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be a left-over from copy pasting this from offsets. Or can you also pass a null for the sizes to FromArrays?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does technically work the same. If you use None in either the offsets or sizes arrays, the value will become null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
>>> import pyarrow as pa
>>> offsets = [0]
>>> values = [1]
>>> sizes = [None]
>>> pa.ListViewArray.from_arrays(offsets, sizes, values)
<pyarrow.lib.ListViewArray object at 0x13509a1a0>
[
null
]
>>> sizes = [0]
>>> pa.ListViewArray.from_arrays(offsets, sizes, values)
<pyarrow.lib.ListViewArray object at 0x13509a3e0>
[
[]
]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this to the unit tests in test_array.py
python/pyarrow/array.pxi
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| >>> array.offsets | |
| >>> array.sizes |
Will need to update the below output as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
python/pyarrow/types.pxi
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment can probably be generalized, mentioning that the "view" type in general might not be supported by all Arrow implementations (regardless of the size?)
|
The next PR will focus on converting python objects to ListView arrays. That will enable ListView to be added to the existing ListArray tests. For now, some temporary basic unit testing is included. |
jorisvandenbossche
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
python/pyarrow/array.pxi
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| values taking into consideration the array's offset. | |
| values taking into consideration the array's order and offset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 I also updated LargeListViewArray.values() docstrings
python/pyarrow/array.pxi
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe also show array itself first (so you can compare with that output)?
python/pyarrow/array.pxi
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe here show both array and array.values outputs as well so it's easier to compare the different ones?
python/pyarrow/array.pxi
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def flatten(self, pool=None): | |
| def flatten(self, memory_pool=None): |
We are not very consistent about it, but at least in this file we use memory_pool more than pool .. (something to clean up at some point)
python/pyarrow/array.pxi
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def flatten(self, pool=None): | |
| def flatten(self, memory_pool=None): |
Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
3155cea to
9bf1673
Compare
jorisvandenbossche
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
This comment was marked as resolved.
This comment was marked as resolved.
…pache#39813) ### Rationale for this change Add bindings to the ListView and LargeListView array formats. ### What changes are included in this PR? * Add initial implementation for ListView and LargeListView * Add basic unit tests ### Are these changes tested? * Basic unit tests only (follow up PRs will be needed to implement full functionality) ### Are there any user-facing changes? Yes, documentation is updated in this PR to include the new PyArrow objects. * Closes: apache#39812 Lead-authored-by: Dane Pitkin <dane@voltrondata.com> Co-authored-by: Dane Pitkin <48041712+danepitkin@users.noreply.github.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Rationale for this change
Add bindings to the ListView and LargeListView array formats.
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?
Yes, documentation is updated in this PR to include the new PyArrow objects.