Skip to content

[C++][Python] Incorrect result for floor_temporal with 3 and 'year' #46301

@MarcoGorelli

Description

@MarcoGorelli

Describe the bug, including details regarding any error messages, version, and platform.

import pyarrow as pa
import pyarrow.compute as pc
import duckdb

print(duckdb.sql(
    """
    from values (timestamp '1970-01-01') df(a)
    select time_bucket('3 years', "a", timestamp '1970-01-01')
    """
))
print(pc.floor_temporal(pa.array([datetime(1970, 1, 1)]), 3, 'year'))

Outputs:

┌────────────────────────────────────────────────────────────┐
│ time_bucket('3 years', a, CAST('1970-01-01' AS TIMESTAMP)) │
│                         timestamp                          │
├────────────────────────────────────────────────────────────┤
│ 1970-01-01 00:00:00                                        │
└────────────────────────────────────────────────────────────┘

[
  1968-01-01 00:00:00.000000
]

The DuckDB output differs from the PyArrow one. Given that the pyarrow docs say

By default, the origin is 1970-01-01T00:00:00.

I would expect it to be aligned with DuckDB when specifying timestamp '1970-01-01' as origin.

In fact, if I use 36, 'month', then PyArrow also returns '1970-01-01'. The fact that 3, 'year' differs from 3*12, 'month' suggests to me that there's a bug

In [6]: pc.floor_temporal(arr, 3, 'year')
Out[6]: 
<pyarrow.lib.TimestampArray object at 0x7fe44180fca0>
[
  1968-01-01 00:00:00.000000
]

In [7]: pc.floor_temporal(arr, 3*12, 'month')
Out[7]: 
<pyarrow.lib.TimestampArray object at 0x7fe443a39540>
[
  1970-01-01 00:00:00.000000
]

Component(s)

Python

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions