-
Notifications
You must be signed in to change notification settings - Fork 4k
Closed
Milestone
Description
The following code
import pyarrow as pa
import pyarrow.parquet as pq
import pandas as pd
n=3
df = pd.DataFrame({'x': range(n)}, index=pd.DatetimeIndex(start='2017-01-01', freq='1n', periods=n))
pq.write_table(pa.Table.from_pandas(df), '/tmp/t.parquet')results in:
ArrowInvalid: Casting from timestamp[ns] to timestamp[us] would lose data: 1483228800000000001
The desired effect is that we can save nanosecond resolution without losing precision (e.g. conversion to ms). Note that if freq='1u' is used, the code runs properly.
Environment: Python 3.6.4. Mac OSX and CentOS Linux release 7.3.1611. Pandas 0.21.1 .
Reporter: Jordan Samuels
Assignee: TP Boudreau / @tpboudreau
Related issues:
- [Python] Cast all timestamp resolutions to INT96 use_deprecated_int96_timestamps=True (is related to)
- [C++] Support for writing TIMESTAMP_NANOS Parquet metadata (is related to)
- [C++] Upgrade to use LogicalType annotations instead of ConvertedType (depends upon)
Note: This issue was originally created as ARROW-1957. Please see the migration documentation for further details.