-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
The type detection from datetime objects to array appears to ignore the presence of a tzinfo on the datetime object, instead storing them as naive timestamp columns.
Python code:
import datetime
import pytz
import pyarrow as pa
naive_datetime = datetime.datetime(2019, 1, 13, 12, 11, 10)
utc_datetime = datetime.datetime(2019, 1, 13, 12, 11, 10, tzinfo=pytz.utc)
tzaware_datetime = utc_datetime.astimezone(pytz.timezone('America/Los_Angeles'))
def inspect(varname):
print(varname)
arr = globals()[varname]
print(arr.type)
print(arr)
print()
auto_naive_arr = pa.array([naive_datetime])
inspect("auto_naive_arr")
auto_utc_arr = pa.array([utc_datetime])
inspect("auto_utc_arr")
auto_tzaware_arr = pa.array([tzaware_datetime])
inspect("auto_tzaware_arr")
auto_mixed_arr = pa.array([utc_datetime, tzaware_datetime])
inspect("auto_mixed_arr")
naive_type = pa.timestamp("us", naive_datetime.tzname())
utc_type = pa.timestamp("us", utc_datetime.tzname())
tzaware_type = pa.timestamp("us", tzaware_datetime.tzname())
naive_arr = pa.array([naive_datetime], type=naive_type)
inspect("naive_arr")
utc_arr = pa.array([utc_datetime], type=utc_type)
inspect("utc_arr")
tzaware_arr = pa.array([tzaware_datetime], type=tzaware_type)
inspect("tzaware_arr")
mixed_arr = pa.array([utc_datetime, tzaware_datetime], type=utc_type)
inspect("mixed_arr")This prints:
$ python detect_timezone.py
auto_naive_arr
timestamp[us]
[
1547381470000000
]
auto_utc_arr
timestamp[us]
[
1547381470000000
]
auto_tzaware_arr
timestamp[us]
[
1547352670000000
]
auto_mixed_arr
timestamp[us]
[
1547381470000000,
1547352670000000
]
naive_arr
timestamp[us]
[
1547381470000000
]
utc_arr
timestamp[us, tz=UTC]
[
1547381470000000
]
tzaware_arr
timestamp[us, tz=PST]
[
1547352670000000
]
mixed_arr
timestamp[us, tz=UTC]
[
1547381470000000,
1547352670000000
]
But I would expect the following types instead:
-
naive_datetime:timestamp[us] -
auto_utc_arr:timestamp[us, tz=UTC] -
auto_tzaware_arr:timestamp[us, tz=PST](Or maybetz='America/Los_Angeles'. I'm not sure whypytzreturnsPSTas thetzname) -
auto_mixed_arr:timestamp[us, tz=UTC]Also, in the "mixed" case, I'd expect the actual stored microseconds to be the same for both rows, since
utc_datetimeandtzaware_datetimeboth refer to the same point in time. It seems reasonable for any naive datetime objects mixed in with tz-aware datetimes to be interpreted as UTC.
Environment: $ python --version
Python 3.7.2
$ pip freeze
numpy==1.16.2
pyarrow==0.12.1
pytz==2018.9
six==1.12.0
$ sw_vers
ProductName: Mac OS X
ProductVersion: 10.14.3
BuildVersion: 18D109
(pyarrow)
Reporter: Tim Swast / @tswast
Assignee: Krisztian Szucs / @kszucs
Related issues:
- [Python] Honor tzinfo information when converting from datetime to pyarrow (is fixed by)
- [Python] Honor tzinfo information when converting from datetime to pyarrow (relates to)
Note: This issue was originally created as ARROW-4965. Please see the migration documentation for further details.