-
Notifications
You must be signed in to change notification settings - Fork 4k
Open
Description
Describe the bug, including details regarding any error messages, version, and platform.
import pyarrow as pa
import pyarrow.compute as pc
import duckdb
print(duckdb.sql(
"""
from values (timestamp '1970-01-01') df(a)
select time_bucket('3 years', "a", timestamp '1970-01-01')
"""
))
print(pc.floor_temporal(pa.array([datetime(1970, 1, 1)]), 3, 'year'))Outputs:
┌────────────────────────────────────────────────────────────┐
│ time_bucket('3 years', a, CAST('1970-01-01' AS TIMESTAMP)) │
│ timestamp │
├────────────────────────────────────────────────────────────┤
│ 1970-01-01 00:00:00 │
└────────────────────────────────────────────────────────────┘
[
1968-01-01 00:00:00.000000
]
The DuckDB output differs from the PyArrow one. Given that the pyarrow docs say
By default, the origin is 1970-01-01T00:00:00.
I would expect it to be aligned with DuckDB when specifying timestamp '1970-01-01' as origin.
In fact, if I use 36, 'month', then PyArrow also returns '1970-01-01'. The fact that 3, 'year' differs from 3*12, 'month' suggests to me that there's a bug
In [6]: pc.floor_temporal(arr, 3, 'year')
Out[6]:
<pyarrow.lib.TimestampArray object at 0x7fe44180fca0>
[
1968-01-01 00:00:00.000000
]
In [7]: pc.floor_temporal(arr, 3*12, 'month')
Out[7]:
<pyarrow.lib.TimestampArray object at 0x7fe443a39540>
[
1970-01-01 00:00:00.000000
]Component(s)
Python