-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
For #41665 (implemented for Array in #42112 / #42113), we currently use the following assertion to check if the data is on CPU (and thus supports the operation in question that access the data's address):
arrow/python/pyarrow/array.pxi
Lines 2035 to 2037 in d4d92e4
| cdef void _assert_cpu(self) except *: | |
| if self.sp_array.get().device_type() != CDeviceAllocationType_kCPU: | |
| raise NotImplementedError("Implemented only for data on CPU device") |
This checks explicitly for the CPU device allocation type.
However, this means that for example data with a CUDA_HOST device type, which is actually accessible from the CPU, will trigger this error:
import numpy as np
import pyarrow as pa
from pyarrow import cuda
# create Array with CudaHost buffer
buf = cuda.new_host_buffer(5*8)
np.frombuffer(buf, dtype=np.int64)[:] = range(5)
arr = pa.Array.from_buffers(pa.int64(), size, [None, buf])
# inspect the array
>>> arr
<pyarrow.lib.Int64Array object at 0x7f24b6e02e00>
[
0,
1,
2,
3,
4
]
>>> arr.device_type
<DeviceAllocationType.CUDA_HOST: 3>
# calling a method that checks _assert_cpu errors
>>> arr.sum()
...
NotImplementedError: Implemented only for data on CPU device
# but the underlying buffer itself "is_cpu"
>>> arr.buffers()[1]
<pyarrow.Buffer address=0x7f24c1600400 size=80 is_cpu=True is_mutable=True>
>>> arr.buffers()[1].is_cpu
True
>>> arr.buffers()[1].device_type
<DeviceAllocationType.CUDA_HOST: 3>At the buffer level we have this is_cpu attribute available, but currently on the Array level we only have device_type(). We could add CUDA_HOST device allocation type explicitly to the check above, but ideally we would use something more general?
(cc @danepitkin)