-
Notifications
You must be signed in to change notification settings - Fork 4k
Closed
Description
The original description is at pandas-dev/pandas#32587
Code Sample, a copy-pastable example if possible
import pandas as pd
from datetime import datetime, timezone
df = pd.DataFrame.from_records([
(1, datetime.now().replace(tzinfo=timezone.utc)),
(2, datetime.now().replace(tzinfo=timezone.min))],
columns=["1", "2"])
print(df["2"])
print()
df.to_feather("/tmp/1")
df2 = pd.read_feather("/tmp/1")
print(df2["2"])This code will output:
0 2020-03-10 18:13:49.405598+00:00
1 2020-03-10 18:13:49.405626-23:59
Name: 2, dtype: object
0 2020-03-10 18:13:49.405598
1 2020-03-10 18:13:49.405626
Name: 2, dtype: datetime64[ns]
Problem description
The round-trip dtype changed from the correct object to incorrect datetime64. Thus the timezones were discarded in Arrow and the timestamps became invalid.
Expected Output
(identical)
0 2020-03-10 18:13:49.405598+00:00
1 2020-03-10 18:13:49.405626-23:59
Name: 2, dtype: object
0 2020-03-10 18:13:49.405598+00:00
1 2020-03-10 18:13:49.405626-23:59
Name: 2, dtype: object
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit : None
python : 3.7.5.final.0
python-bits : 64
OS : Linux
OS-release : 5.3.0-40-generic
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.0.1
numpy : 1.17.4
pytz : 2019.2
dateutil : 2.7.3
pip : 19.3.1
setuptools : 42.0.1
Cython : 0.29.14
pytest : 5.3.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.5.0
html5lib : None
pymysql : None
psycopg2 : 2.8.4 (dt dec pq3 ext lo64)
jinja2 : 2.10.3
IPython : 7.10.0
pandas_datareader: None
bs4 : 4.8.1
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : 4.5.0
matplotlib : 3.1.2
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.16.0
pytables : None
pytest : 5.3.1
pyxlsb : None
s3fs : None
scipy : 1.2.1
sqlalchemy : 1.3.12
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : None
Reporter: Markovtsev Vadim
Related issues:
- [Python] Honor tzinfo information when converting from datetime to pyarrow (is duplicated by)
Note: This issue was originally created as ARROW-8066. Please see the migration documentation for further details.