Skip to content

Conversation

@lukemanley
Copy link
Member

A few related changes:

  1. ArrowExtensionArray.__getitem__(int) will now return a Timestamp/Timedelta for non-nano timestamp/duration types to be consistent with nanosecond types. Previously non-nano types returned python native datetime/timedelta.
  2. ArrowExtensionArray.__iter__ will now yield Timestamp/Timedelta objects for non-nano types to be consistent with nanosecond types.
  3. ArrowExtensionArray.to_numpy now allows for zero-copy for timestamp/duration types

Submitting as a single PR since there are a number of tests that require consistency across these methods and trying to split the nano/non-nano behavior from the performance improvements is tricky.

These were somewhat motivated by:

import pandas as pd
import pyarrow as pa

N = 1_000_000
arr = pd.array(range(N), dtype=pd.ArrowDtype(pa.timestamp("s")))

%timeit arr.astype("M8[s]")
# 5.29 s ± 162 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)          -> main
# 137 µs ± 4.88 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)   -> PR

%timeit pd.DatetimeIndex(arr)
# 6.31 s ± 560 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)          -> main
# 67.6 µs ± 3.03 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)  -> pr

@lukemanley lukemanley added Performance Memory or execution speed performance Arrow pyarrow functionality Non-Nano datetime64/timedelta64 with non-nanosecond resolution labels May 21, 2023
@lukemanley lukemanley added this to the 2.1 milestone May 21, 2023
@mroeschke mroeschke merged commit 1e61215 into pandas-dev:main May 22, 2023
@mroeschke
Copy link
Member

Thanks @lukemanley

topper-123 pushed a commit to topper-123/pandas that referenced this pull request May 22, 2023
…ance (pandas-dev#53326)

* ENH/PERF: pyarrow timestamp & duration conversion consistency

* gh refs

* typo

* whatsnew
@lukemanley lukemanley deleted the pyarrow-temporal-conversions branch May 30, 2023 22:16
Daquisu pushed a commit to Daquisu/pandas that referenced this pull request Jul 8, 2023
…ance (pandas-dev#53326)

* ENH/PERF: pyarrow timestamp & duration conversion consistency

* gh refs

* typo

* whatsnew
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Arrow pyarrow functionality Non-Nano datetime64/timedelta64 with non-nanosecond resolution Performance Memory or execution speed performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants