Skip to content

[C++] Add a strptime option to control the cutoff between 1900 and 2000 when %y  #31951

@asfimport

Description

@asfimport

When parsing to datetime a string with year in the short format ({}%y{}), it would be great if we could have control over the cutoff point between 1900 and 2000. Currently it is implicitly set to 68:

library(arrow, warn.conflicts = FALSE)

a <- Array$create(c("68-05-17", "69-05-17"))
call_function("strptime", a, options = list(format = "%y-%m-%d", unit = 0L))
#> Array
#> <timestamp[s]>
#> [
#>   2068-05-17 00:00:00,
#>   1969-05-17 00:00:00
#> ]

For example, lubridate named this argument cutoff_2000 argument (e.g. for {}fast_strptime){}. This works as follows:

library(lubridate, warn.conflicts = FALSE)

dates_vector <- c("68-05-17", "69-05-17", "55-05-17")
fast_strptime(dates_vector, format = "%y-%m-%d")
#> [1] "2068-05-17 UTC" "1969-05-17 UTC" "2055-05-17 UTC"
fast_strptime(dates_vector, format = "%y-%m-%d", cutoff_2000 = 50)
#> [1] "1968-05-17 UTC" "1969-05-17 UTC" "1955-05-17 UTC"
fast_strptime(dates_vector, format = "%y-%m-%d", cutoff_2000 = 70)
#> [1] "2068-05-17 UTC" "2069-05-17 UTC" "2055-05-17 UTC"

In the lubridate::fast_strptime() documentation it is described as follows:

cutoff_2000
integer. For y format, two-digit numbers smaller or equal to cutoff_2000 are parsed as though starting with 20, otherwise parsed as though starting with 19. {}Available only for functions relying on lubridates internal parser{}.

Reporter: Dragoș Moldovan-Grünfeld / @dragosmg

Related issues:

Note: This issue was originally created as ARROW-16596. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions