-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
Currently, if you have a pyarrow Array or RecordBatch/Table object that is backed by non-CPU data, just displaying the object (__repr__) crashes, because our PrettyPrint functionality assumes it deals with data on the CPU.
At a minimum, we should make the repr not crash, for example by first checking whether we have CPU data, and if not only printing generic information (the array type or the schema) and not a preview of the data.
But, I think we could also do better by actually ensuring the repr works and is informative for non-CPU data as well. For the pretty printing part of the repr, we only need a small subset of the data (by default first and last 5 elements), and copying such portion to the CPU just for printing should generally be fine.
If we implement this on the Python side, this depends on exposing the generic CopyTo functionality (#41126) to copy to CPU device. However, we could maybe also implement this on the C++ side in PrettyPrint itself? (taking a quick look at the current implementation, I think that would require quite some refactoring, though)