Skip to content

[Python] Fixed size lists of numeric types without nulls could be converted to numpy with zero-copy #35622

@spenczar

Description

@spenczar

Describe the enhancement requested

Today, FixedSizeListArrays seem to demand that zero_copy_only be set to False unnecessarily.

For example:

Create a fixed size list array: 3 rows, with fixed-size of 2:

import pyarrow as pa
data = pa.FixedSizeListArray.from_arrays([1, 2, 3, 4, 5, 6], 2)
print(data)
<pyarrow.lib.FixedSizeListArray object at 0x104abe140>
[
  [
    1,
    2
  ],
  [
    3,
    4
  ],
  [
    5,
    6
  ]
]

Calling to_numpy() throws an error:

data.to_numpy()
ArrowInvalid: Needed to copy 1 chunks with 0 nulls, but zero_copy_only was True

But if I work with buffers directly, I can easily get it to work:

nparray = np.frombuffer(data.buffers()[2], dtype=np.int64())
print(nparray)
array([1, 2, 3, 4, 5, 6])

We also know enough to even give it the right ndarray shape:

nparray_shaped = np.frombuffer(data.buffers()[2], dtype=np.int64()).reshape(len(data), data.type.list_size)
print(nparray_shaped)
array([[1, 2],
       [3, 4],
       [5, 6]])

I propose that FixedSizeListArray.to_numpy() should return numpy arrays with zero copy if the FixedSizeList's type is an integer or floating point type, since those are safe to convert, and if no nulls are present.

I also propose that it reshape the output to be an ndarray which matches the FixedSizeList's shape.

Component(s)

Python

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions