-
Notifications
You must be signed in to change notification settings - Fork 4k
Open
Description
Describe the enhancement requested
Today, FixedSizeListArrays seem to demand that zero_copy_only be set to False unnecessarily.
For example:
Create a fixed size list array: 3 rows, with fixed-size of 2:
import pyarrow as pa
data = pa.FixedSizeListArray.from_arrays([1, 2, 3, 4, 5, 6], 2)
print(data)<pyarrow.lib.FixedSizeListArray object at 0x104abe140>
[
[
1,
2
],
[
3,
4
],
[
5,
6
]
]
Calling to_numpy() throws an error:
data.to_numpy()ArrowInvalid: Needed to copy 1 chunks with 0 nulls, but zero_copy_only was True
But if I work with buffers directly, I can easily get it to work:
nparray = np.frombuffer(data.buffers()[2], dtype=np.int64())
print(nparray)array([1, 2, 3, 4, 5, 6])
We also know enough to even give it the right ndarray shape:
nparray_shaped = np.frombuffer(data.buffers()[2], dtype=np.int64()).reshape(len(data), data.type.list_size)
print(nparray_shaped)array([[1, 2],
[3, 4],
[5, 6]])
I propose that FixedSizeListArray.to_numpy() should return numpy arrays with zero copy if the FixedSizeList's type is an integer or floating point type, since those are safe to convert, and if no nulls are present.
I also propose that it reshape the output to be an ndarray which matches the FixedSizeList's shape.
Component(s)
Python