[Python] Pretty printing very large ChunkedArray objects can use unbounded memory

In working on ARROW-2970, I have the following dataset:

```Java

values = [b'x'] + [
    b'x' * (1 << 20)
] * 2 * (1 << 10)

arr = np.array(values)

arrow_arr = pa.array(arr)
```

The object `arrow_arr` has 129 chunks, each element of which is 1MB of binary. The repr for this object is over 600MB:

```Java

In [10]: rep = repr(arrow_arr)

In [11]: len(rep)
Out[11]: 637536258
```

There's probably a number of failsafes we can implement to avoid badness in these pathological cases (which may not happen often, but given the kinds of bug reports we are seeing, people do have datasets that look like this)

**Reporter**: [Wes McKinney](https://issues.apache.org/jira/browse/ARROW-4099) / @wesm
#### Related issues:
- [PrettyPrint Improvements](https://github.com/apache/arrow/issues/33527) (is a child of)

<sub>**Note**: *This issue was originally created as [ARROW-4099](https://issues.apache.org/jira/browse/ARROW-4099). Please see the [migration documentation](https://github.com/apache/arrow/issues/14542) for further details.*</sub>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Python] Pretty printing very large ChunkedArray objects can use unbounded memory #20692

Related issues:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Python] Pretty printing very large ChunkedArray objects can use unbounded memory #20692

Description

Related issues:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions