-
Notifications
You must be signed in to change notification settings - Fork 4k
Closed
Description
I haven't looked carefully at the hot path for this, but I would expect these statements to have roughly the same performance (offloading the ndarray serialization to pickle)
In [1]: import pickle
In [2]: import numpy as np
In [3]: import pyarrow as pa
a
In [4]: arr = np.array(['foo', 'bar', None] * 100000, dtype=object)
In [5]: timeit serialized = pa.serialize(arr).to_buffer()
10 loops, best of 3: 27.1 ms per loop
In [6]: timeit pickled = pickle.dumps(arr)
100 loops, best of 3: 6.03 ms per loop@robertnishihara @pcmoritz I encountered this while working on ARROW-1783, but it can likely be resolved independently
Reporter: Wes McKinney / @wesm
Assignee: Wes McKinney / @wesm
Related issues:
- [Python] Read and write pandas.DataFrame in pyarrow.serialize by decomposing the BlockManager rather than coercing to Arrow format (is related to)
Original Issue Attachments:
PRs and other links:
Note: This issue was originally created as ARROW-1854. Please see the migration documentation for further details.