Skip to content

Conversation

@spenczar
Copy link
Contributor

@spenczar spenczar commented Jun 1, 2023

What changes are included in this PR?

#35747 asks for additions to all three ListArray types (ListArray, FixedSizeListArray, and LargeListArray). This PR only adds for FixedSizeListArray. That's just because it's both the easiest to add, and the most obviously useful scenario.

This implementation is just in Python and follows the general pattern of FixedShapeTensorArrays.

An alternative implementation would be to do this in python/pyarrow/src/arrow/python/numpy_convert.cc. We could detect 2D Numpy arrays and convert them to FixedShapeTensorArrays. That would be a much more significant change; I think it would change the behavior of other calls like pyarrow.array(). It would also allow for much more careful management of copies and memory; it should be possible to implement both to_ and from_ with zero copy for primitive numeric types. However, it would be a lot more complex, and this is my first Arrow contribution, so I figured I'd start a bit smaller.

Are these changes tested?

Yes.

Are there any user-facing changes?

Yes, and I think I added sufficient documentation.

@spenczar spenczar requested a review from AlenkaF as a code owner June 1, 2023 03:47
@github-actions
Copy link

github-actions bot commented Jun 1, 2023

⚠️ GitHub issue #35747 has been automatically assigned in GitHub to PR creator.

@AlenkaF
Copy link
Member

AlenkaF commented Jun 19, 2023

Thank you @spenczar for making a contribution!
And sorry it took a while to get a review.

The implementation in Python instead of C++ makes sense. It might also make sense to think about the value of having this case (numpy arrays of 1 vs 2 dimensions, how one could still use 2-dim arrays in pyarrow avoiding an error) added to the documentation without adding two new functions to the code. I think that could be enough for a user needing to have an option for 2-dim numpy arrays.

@rok
Copy link
Member

rok commented May 8, 2024

Hey @spenczar, would you consider:

FixedSizeListArray:

  • "casting" to pa.FixedShapeTensorArray and then using to_numpy
  • pa.FixedShapeTensorArray.from_numpy_ndarray and then getting the fixed sized list type storage?

ListArray/LargeListArray:

This could work either currently as a workaround or as machinery for this PR.
In the latter case there's a minor caveat - FixedShapeTensorArray and VariableShapeTensorArray are available only if json is enabled ad compile time (-DARROW_JSON=ON).

@spenczar
Copy link
Contributor Author

Cleaning up old un-merged PRs in my queue. I still think this was a good idea but it appears to have no movement; I understand it is blocked by #40354.

@spenczar spenczar closed this Sep 23, 2025
@rok
Copy link
Member

rok commented Sep 23, 2025

I hope we cycle back to this once #40354 is merged. And sorry for the wait.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Python] Add from_numpy_ndarray and to_numpy_ndarray to ListArray types

3 participants