Skip to content

[Python] Current assertion of CPU-accessible data in Array methods is specific to CPU device type #43511

@jorisvandenbossche

Description

@jorisvandenbossche

For #41665 (implemented for Array in #42112 / #42113), we currently use the following assertion to check if the data is on CPU (and thus supports the operation in question that access the data's address):

arrow/python/pyarrow/array.pxi

Lines 2035 to 2037 in d4d92e4

cdef void _assert_cpu(self) except *:
if self.sp_array.get().device_type() != CDeviceAllocationType_kCPU:
raise NotImplementedError("Implemented only for data on CPU device")

This checks explicitly for the CPU device allocation type.
However, this means that for example data with a CUDA_HOST device type, which is actually accessible from the CPU, will trigger this error:

import numpy as np
import pyarrow as pa
from pyarrow import cuda

# create Array with CudaHost buffer
buf = cuda.new_host_buffer(5*8)
np.frombuffer(buf, dtype=np.int64)[:] = range(5)
arr = pa.Array.from_buffers(pa.int64(), size, [None, buf])

# inspect the array
>>> arr
<pyarrow.lib.Int64Array object at 0x7f24b6e02e00>
[
  0,
  1,
  2,
  3,
  4
]
>>> arr.device_type
<DeviceAllocationType.CUDA_HOST: 3>

# calling a method that checks _assert_cpu errors
>>> arr.sum()
...
NotImplementedError: Implemented only for data on CPU device

# but the underlying buffer itself "is_cpu"
>>> arr.buffers()[1]
<pyarrow.Buffer address=0x7f24c1600400 size=80 is_cpu=True is_mutable=True>
>>> arr.buffers()[1].is_cpu
True
>>> arr.buffers()[1].device_type
<DeviceAllocationType.CUDA_HOST: 3>

At the buffer level we have this is_cpu attribute available, but currently on the Array level we only have device_type(). We could add CUDA_HOST device allocation type explicitly to the check above, but ideally we would use something more general?

(cc @danepitkin)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions