Skip to content

[Python] Implement a repr for Array and RecordBatch/Table for non-CPU data #41664

@jorisvandenbossche

Description

@jorisvandenbossche

Currently, if you have a pyarrow Array or RecordBatch/Table object that is backed by non-CPU data, just displaying the object (__repr__) crashes, because our PrettyPrint functionality assumes it deals with data on the CPU.

At a minimum, we should make the repr not crash, for example by first checking whether we have CPU data, and if not only printing generic information (the array type or the schema) and not a preview of the data.

But, I think we could also do better by actually ensuring the repr works and is informative for non-CPU data as well. For the pretty printing part of the repr, we only need a small subset of the data (by default first and last 5 elements), and copying such portion to the CPU just for printing should generally be fine.

If we implement this on the Python side, this depends on exposing the generic CopyTo functionality (#41126) to copy to CPU device. However, we could maybe also implement this on the C++ side in PrettyPrint itself? (taking a quick look at the current implementation, I think that would require quite some refactoring, though)

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions