Skip to content

[Python] Timestamp array type detection should use tzname of datetime.datetime objects #21470

@asfimport

Description

@asfimport

The type detection from datetime objects to array appears to ignore the presence of a tzinfo on the datetime object, instead storing them as naive timestamp columns.

Python code:

import datetime
import pytz
import pyarrow as pa

naive_datetime = datetime.datetime(2019, 1, 13, 12, 11, 10)
utc_datetime = datetime.datetime(2019, 1, 13, 12, 11, 10, tzinfo=pytz.utc)
tzaware_datetime = utc_datetime.astimezone(pytz.timezone('America/Los_Angeles'))

def inspect(varname):
    print(varname)
    arr = globals()[varname]
    print(arr.type)
    print(arr)
    print()

auto_naive_arr = pa.array([naive_datetime])
inspect("auto_naive_arr")

auto_utc_arr = pa.array([utc_datetime])
inspect("auto_utc_arr")

auto_tzaware_arr = pa.array([tzaware_datetime])
inspect("auto_tzaware_arr")

auto_mixed_arr = pa.array([utc_datetime, tzaware_datetime])
inspect("auto_mixed_arr")

naive_type = pa.timestamp("us", naive_datetime.tzname())
utc_type = pa.timestamp("us", utc_datetime.tzname())
tzaware_type = pa.timestamp("us", tzaware_datetime.tzname())

naive_arr = pa.array([naive_datetime], type=naive_type)
inspect("naive_arr")

utc_arr = pa.array([utc_datetime], type=utc_type)
inspect("utc_arr")

tzaware_arr = pa.array([tzaware_datetime], type=tzaware_type)
inspect("tzaware_arr")

mixed_arr = pa.array([utc_datetime, tzaware_datetime], type=utc_type)
inspect("mixed_arr")

This prints:


$ python detect_timezone.py
auto_naive_arr
timestamp[us]
[
  1547381470000000
]

auto_utc_arr
timestamp[us]
[
  1547381470000000
]

auto_tzaware_arr
timestamp[us]
[
  1547352670000000
]

auto_mixed_arr
timestamp[us]
[
  1547381470000000,
  1547352670000000
]

naive_arr
timestamp[us]
[
  1547381470000000
]

utc_arr
timestamp[us, tz=UTC]
[
  1547381470000000
]

tzaware_arr
timestamp[us, tz=PST]
[
  1547352670000000
]

mixed_arr
timestamp[us, tz=UTC]
[
  1547381470000000,
  1547352670000000
]

But I would expect the following types instead:

  • naive_datetime: timestamp[us]

  • auto_utc_arr: timestamp[us, tz=UTC]

  • auto_tzaware_arr: timestamp[us, tz=PST] (Or maybe tz='America/Los_Angeles'. I'm not sure why pytz returns PST as the tzname)

  • auto_mixed_arr: timestamp[us, tz=UTC]

    Also, in the "mixed" case, I'd expect the actual stored microseconds to be the same for both rows, since utc_datetime and tzaware_datetime both refer to the same point in time. It seems reasonable for any naive datetime objects mixed in with tz-aware datetimes to be interpreted as UTC.

Environment: $ python --version
Python 3.7.2

$ pip freeze
numpy==1.16.2
pyarrow==0.12.1
pytz==2018.9
six==1.12.0

$ sw_vers
ProductName: Mac OS X
ProductVersion: 10.14.3
BuildVersion: 18D109
(pyarrow)
Reporter: Tim Swast / @tswast
Assignee: Krisztian Szucs / @kszucs

Related issues:

Note: This issue was originally created as ARROW-4965. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions