Skip to content

[Python] Improve performance of serializing object dtype ndarrays #17848

@asfimport

Description

@asfimport

I haven't looked carefully at the hot path for this, but I would expect these statements to have roughly the same performance (offloading the ndarray serialization to pickle)

In [1]: import pickle

In [2]: import numpy as np

In [3]: import pyarrow as pa
a
In [4]: arr = np.array(['foo', 'bar', None] * 100000, dtype=object)

In [5]: timeit serialized = pa.serialize(arr).to_buffer()
10 loops, best of 3: 27.1 ms per loop

In [6]: timeit pickled = pickle.dumps(arr)
100 loops, best of 3: 6.03 ms per loop

@robertnishihara @pcmoritz I encountered this while working on ARROW-1783, but it can likely be resolved independently

Reporter: Wes McKinney / @wesm
Assignee: Wes McKinney / @wesm

Related issues:

Original Issue Attachments:

PRs and other links:

Note: This issue was originally created as ARROW-1854. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions