-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
The TimestampParser seems to be able to cycle through several formats. This sort of functionality would be very useful for some of the lubridate bindings that need to behave in a similar way.
library(arrow)
library(fs)
library(readr)
library(tibble)
tf <- fs::file_temp(ext = "csv")
fs::file_create(tf)
sample_times <- tibble(a = c("09/13/2013", "25/12/1998", "09-13-13", "23_Feb_2022", "09/13/2018"))
write_csv(sample_times, tf)
read_csv_arrow(tf,
as_data_frame = TRUE,
timestamp_parsers = c("%m/%d/%Y", "%d/%m/%Y", "%m-%d-%y", "%d_%b_%Y"))
#> # A tibble: 5 × 1
#> a
#> <dttm>
#> 1 2013-09-13 01:00:00
#> 2 1998-12-25 00:00:00
#> 3 2013-09-13 01:00:00
#> 4 2022-02-23 00:00:00
#> 5 2018-09-13 01:00:00For example, in lubridate, the ymd() cycles through all possible formats that have year-month-date components in the right order (e.g. "%Y-%m-%d", "%y-%m-%d", "%Y-%b-%d", "%y-%b-%d", "%Y-%B-%d", "%y-%b-%d", etc).
I guess my question is: Can we factor this CSV reader feature to be usable elsewhere? This was the bit that caught my attention: "using the virtual parser interface in arrow/util/value_parsing.h", and told me that using it elsewhere might be a possibility.
Reporter: Dragoș Moldovan-Grünfeld / @dragosmg
Related issues:
- [C++] Strptime issues umbrella (is a child of)
Note: This issue was originally created as ARROW-15912. Please see the migration documentation for further details.