Skip to content

[Python] Write nanosecond timestamps using new NANO LogicalType Parquet unit #17945

@asfimport

Description

@asfimport

The following code

import pyarrow as pa
import pyarrow.parquet as pq
import pandas as pd

n=3
df = pd.DataFrame({'x': range(n)}, index=pd.DatetimeIndex(start='2017-01-01', freq='1n', periods=n))
pq.write_table(pa.Table.from_pandas(df), '/tmp/t.parquet')

results in:

ArrowInvalid: Casting from timestamp[ns] to timestamp[us] would lose data: 1483228800000000001

The desired effect is that we can save nanosecond resolution without losing precision (e.g. conversion to ms). Note that if freq='1u' is used, the code runs properly.

Environment: Python 3.6.4. Mac OSX and CentOS Linux release 7.3.1611. Pandas 0.21.1 .
Reporter: Jordan Samuels
Assignee: TP Boudreau / @tpboudreau

Related issues:

Note: This issue was originally created as ARROW-1957. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions