[Python] Pickling a sliced array serializes all the buffers

If a large array is sliced, and pickled, it seems the full buffer is serialized, this leads to excessive memory usage and data transfer when using multiprocessing or dask.
```java

>>> import pyarrow as pa
>>> ar = pa.array(['foo'] * 100_000)
>>> ar.nbytes
700004
>>> import pickle
>>> len(pickle.dumps(ar.slice(10, 1)))
700165

NumPy for instance
>>> import numpy as np
>>> ar_np = np.array(ar)
>>> ar_np
array(['foo', 'foo', 'foo', ..., 'foo', 'foo', 'foo'], dtype=object)
>>> import pickle
>>> len(pickle.dumps(ar_np[10:11]))
165
```
I think this makes sense if you know arrow, but kind of unexpected as a user.

Is there a workaround for this? For instance copy an arrow array to get rid of the offset, and trim the buffers?

**Reporter**: [Maarten Breddels](https://issues.apache.org/jira/browse/ARROW-10739) / @maartenbreddels
**Assignee**: [Clark Zinzow](https://issues.apache.org/jira/browse/ARROW-10739)

#### Related issues:
- https://github.com/apache/arrow/issues/30503 (is related to)

<sub>**Note**: *This issue was originally created as [ARROW-10739](https://issues.apache.org/jira/browse/ARROW-10739). Please see the [migration documentation](https://github.com/apache/arrow/issues/14542) for further details.*</sub>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Python] Pickling a sliced array serializes all the buffers #26685

Related issues:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Python] Pickling a sliced array serializes all the buffers #26685

Description

Related issues:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions