-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
Describe the enhancement requested
Interoperation between numpy ndarrays and Arrow's ListArray types (ListArray, LargeListArray, FixedSizeListArray) is a bit tricky.
It's hard to construct values: one must convert to a Python list-of-lists first, which is unnecessarily expensive:
>>> import numpy as np
>>> import pyarrow as pa
>>> np_values = np.ones((3, 2), np.float64())
>>> pa_dtype = pa.list_(pa.float64())
>>> pa_values= pa.array(np_values, type=pa_dtype)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pyarrow/array.pxi", line 323, in pyarrow.lib.array
File "pyarrow/array.pxi", line 83, in pyarrow.lib._ndarray_to_array
File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: only handle 1-dimensional arrays
>>> pa_values = pa.array(np_values.tolist(), type=pa_dtype)
<pyarrow.lib.ListArray object at 0x11ba433a0>
[
[
1,
1
],
[
1,
1
],
[
1,
1
]
]Likewise, converting to a numpy ndarray from a Pyarrow ListArray type is tricky, as described in #35622. That issue describes trickiness with FixedSizeListArrays, but the same is true of ListArrays, which often might have equal-length lists in every entry, making them amenable to presentation as an ndarray.
I'd like to propose the following 6 new methods:
-
FixedSizeListArray.from_numpy_ndarray(values, type):
Constructs a new FixedSizeListArray fromvalues, which must be a numpy ndarray withndim == 2.
typeis optional; it will be looked up from the ndarray'sdtypeif unset.
Iftypeis set, values of the ndarray's dtype must be convertible to the provided type. -
FixedSizeListArray.to_numpy_ndarray(self):
Returns the FixedSizeListArray's values as a numpy ndarray with a shape of(len(self), self.type.list_size).If any of the FixedSizeListArray's values are
null, raises an error.If any of the FixedSizeListArray's values contain a
null, then returns andarraywithnanin the null spots, and withdtypeset tofloat64, orNonein the null spots anddtypeofobjectif a conversion tofloat64is not possible. This matches the behavior ofArray.to_numpyfor primitive types. -
ListArray.from_numpy_ndarray(values, type):
Works just likeFixedSizeListArray.from_numpy_ndarray. -
ListArray.to_numpy_ndarray(self):
Works likeFixedSizeListArray.to_numpy_ndarray, with an additional check that all list elements are of equal length. If any are different, then raises an error.
and same for LargeListArray as for ListArray, bringing the total to 6.
The FixedSizeListArray methods already have an implementation in the FixedShapeTensor extension type. Those implementation are actually a bit more complicated because of tensors' support for permutations:
arrow/python/pyarrow/array.pxi
Line 3149 in 95c33d8
| def to_numpy_ndarray(self): |
arrow/python/pyarrow/array.pxi
Line 3164 in 95c33d8
| def from_numpy_ndarray(obj): |
Component(s)
Python