-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
When parsing to datetime a string with year in the short format ({}%y{}), it would be great if we could have control over the cutoff point between 1900 and 2000. Currently it is implicitly set to 68:
library(arrow, warn.conflicts = FALSE)
a <- Array$create(c("68-05-17", "69-05-17"))
call_function("strptime", a, options = list(format = "%y-%m-%d", unit = 0L))
#> Array
#> <timestamp[s]>
#> [
#> 2068-05-17 00:00:00,
#> 1969-05-17 00:00:00
#> ]For example, lubridate named this argument cutoff_2000 argument (e.g. for {}fast_strptime){}. This works as follows:
library(lubridate, warn.conflicts = FALSE)
dates_vector <- c("68-05-17", "69-05-17", "55-05-17")
fast_strptime(dates_vector, format = "%y-%m-%d")
#> [1] "2068-05-17 UTC" "1969-05-17 UTC" "2055-05-17 UTC"
fast_strptime(dates_vector, format = "%y-%m-%d", cutoff_2000 = 50)
#> [1] "1968-05-17 UTC" "1969-05-17 UTC" "1955-05-17 UTC"
fast_strptime(dates_vector, format = "%y-%m-%d", cutoff_2000 = 70)
#> [1] "2068-05-17 UTC" "2069-05-17 UTC" "2055-05-17 UTC"In the lubridate::fast_strptime() documentation it is described as follows:
cutoff_2000
integer. For y format, two-digit numbers smaller or equal to cutoff_2000 are parsed as though starting with 20, otherwise parsed as though starting with 19. {}Available only for functions relying on lubridates internal parser{}.
Reporter: Dragoș Moldovan-Grünfeld / @dragosmg
Related issues:
- [C++] Strptime issues umbrella (is a child of)
Note: This issue was originally created as ARROW-16596. Please see the migration documentation for further details.