Skip to content

[R] fix behaviour when converting timestamps with "" as tzone #30005

@asfimport

Description

@asfimport

Form the comments, we've decided to go with option 3:

  • Set the timezone to local time without changing the integer value fo the timestamp. We store whatever integer R passes to us (21600), with CST as the timezone set. Display is then "1970-01-01 00:00:00 CST"
    This is surprising because we are asserting the local timezone when that is not specified in R.

    ============================================

    POSIXct in R can have timezones specified as "" which is typically interpreted as the session local timezone.

    This can lead to surprising results like:

    > Sys.timezone()
    [1] "America/Chicago"
    > as.integer(as.POSIXct("1970-01-01"))
    [1] 21600
    > Sys.setenv(TZ = "UTC")
    > as.integer(as.POSIXct("1970-01-01"))
    [1] 0
    > Sys.setenv(TZ = "Australia/Brisbane")
    > as.integer(as.POSIXct("1970-01-01"))
    [1] -36000

    See also: https://stackoverflow.com/questions/69670142/how-can-i-store-timezone-agnostic-dates-for-sharing-between-r-and-python-using-p/69678923#69678923

    This runs counter to what timestamps without timezones are interpreted as in Arrow:

    arrow/format/Schema.fbs

    Lines 333 to 336 in 0366943

    /// stored as a struct with Date and Time fields. However, it may also be
    /// encoded into a Timestamp column with an empty timezone. The timestamp
    /// values should be computed "as if" the timezone of the date-time values
    /// was UTC; for example, the naive date-time "January 1st 1970, 00h00" would

    However, it may also be encoded into a Timestamp column with an empty timezone. The timestamp values should be computed "as if" the timezone of the date-time values was UTC; for example, the naive date-time "January 1st 1970, 00h00" would be encoded as timestamp value 0.

    Critically in R, when as.POSIXct("1970-01-01 00:00:00") is run, the timestamp value is computed "as if" the timezone of the date-time values was the local timezone (and not UTC like the Arrow spec says).

    This can lead to some surprising results when converting these timezoneless timestamps from R to Arrow. Using as.POSIXct("1970-01-01 00:00:00") as an example, and presume US Central time. We have a few options:

  • Warn when the timezone is "" or not set that the behavior might be surprising
    We store whatever integer R passes to us (21600), with no timezone set. When someone sees this formatted, the times/dates will be what the time was at UTC ("1970-01-01 06:00:00")

  • Set the timezone to UTC without changing the integer value of the timestamp. We store whatever integer R passes to us (21600), with UTC as the timezone set. When someone sees this formatted, the times/dates will be in UTC ("1970-01-01 06:00:00 UTC") This might be surprising / counterintuitive because the timestamps will suddenly be different and will be based in UTC and not local time like people are expecting.

  • Set the timezone to local time without changing the integer value fo the timestamp. We store whatever integer R passes to us (21600), with CST as the timezone set. Display is then "1970-01-01 00:00:00 CST"
    This is surprising because we are asserting the local timezone when that is not specified in R.

    If someone is using a timestamp without tzone in R to represent a timezoneless timestamp, options 2 and 3 above violate that when it is put into Arrow. Whereas, if someone is using a timestamp that just so happens to be without a tzone but they assume it's in local time, option 1 leads to (very) surprising results

Reporter: Jonathan Keane / @jonkeane
Assignee: Dragoș Moldovan-Grünfeld / @dragosmg
Watchers: Rok Mihevc / @rok

Related issues:

PRs and other links:

Note: This issue was originally created as ARROW-14442. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions