Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@

1. `:=` no longer recycles length>1 RHS vectors. There was a warning when recycling left a remainder but no warning when the LHS length was an exact multiple of the RHS length (the same behaviour as base R). Consistent feedback for several years has been that recycling is more often a bug. In rare cases where you need to recycle a length>1 vector, use `rep()` explicitly. Single values are still recycled silently as before. Early warning was given in [this tweet](https://twitter.com/MattDowle/status/1088544083499311104). The 758 CRAN and Bioconductor packages using data.table were tested and the maintainers of the 16 packages affected (2%) were consulted before going ahead, [#3310](https://github.com/Rdatatable/data.table/pull/3310).

2. `as.IDate.default` and `as.ITime.default` gain `use_lookup` argument, default `'auto'`, [#2603](https://github.com/Rdatatable/data.table/issues/2603). In data sets with millions of observations, there are almost universally a much smaller number of unique dates in any given column. In this case, it is much more efficient to parse only the unique set of dates and use this as a lookup table for the full vector; `use_lookup` implements this logic. By default this is done for any conversion of at least 1,000 observations.

#### BUG FIXES

1. `rbindlist()` of a malformed factor missing levels attribute is now a helpful error rather than a cryptic error about `STRING_ELT`, [#3315](https://github.com/Rdatatable/data.table/issues/3315). Thanks to Michael Chirico for reporting.
Expand Down
25 changes: 21 additions & 4 deletions R/IDateTime.R
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,17 @@

as.IDate <- function(x, ...) UseMethod("as.IDate")

as.IDate.default <- function(x, ..., tz = attr(x, "tzone")) {
as.IDate.default <- function(x, ..., tz = attr(x, "tzone"), use_lookup = 'auto') {
if (is.null(tz)) tz = "UTC"
as.IDate(as.Date(x, tz = tz, ...))
if (isTRUE(use_lookup) || (use_lookup == 'auto' && length(x) >= 1000L)) {
Copy link
Copy Markdown
Member

@jangorecki jangorecki Feb 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will it behave when user provide some custom class to this method? This is default method so it can get any kind of input. Or am I missing something?
Where is check that we support merge on x arg that we discussed before?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR still incomplete...

DT = list(input = x)
setDT(DT)

lookup = unique(DT)
lookup[ , 'IDate' := as.IDate(as.Date(input, tz = tz, ...))]
lookup[DT, on = 'input']$IDate
} else {
as.IDate(as.Date(x, tz = tz, ...))
Comment thread
MichaelChirico marked this conversation as resolved.
}

as.IDate.numeric <- function(x, origin = "1970-01-01", ...) {
Expand Down Expand Up @@ -111,8 +119,17 @@ round.IDate <- function (x, digits=c("weeks", "months", "quarters", "years"), ..

as.ITime <- function(x, ...) UseMethod("as.ITime")

as.ITime.default <- function(x, ...) {
as.ITime(as.POSIXlt(x, ...), ...)
as.ITime.default <- function(x, ..., use_lookup = 'auto') {
if (isTRUE(use_lookup) || (use_lookup == 'auto' && length(x) >= 1000)) {
DT = list(input = x)
setDT(DT)

lookup = unique(DT)
lookup[ , 'ITime' := as.ITime(as.POSIXlt(input, ...), ...)]
lookup[DT, on = 'input']$ITime
} else {
as.ITime(as.POSIXlt(x, ...), ...)
}
}

as.ITime.POSIXct <- function(x, tz = attr(x, "tzone"), ...) {
Expand Down
12 changes: 12 additions & 0 deletions inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -13414,6 +13414,18 @@ DT = data.table(a = c(1:3, 3:1))
test(1984.38, rowidv(DT, prefix = 5L), error='must be NULL or a character vector')
test(1984.39, rowidv(DT, prefix = c('hey','you')), error='must be NULL or a character vector')

# #2306, use_merge argument of as.IDate.default
char_date = as.character(.Date(17532:17896))
test(1985.1, as.IDate(char_date, use_lookup = TRUE),
as.IDate(char_date, use_lookup = FALSE))
test(1985.2, as.IDate(char_date, use_lookup = 'auto'),
as.IDate(char_date, use_lookup = FALSE))

char_time = format(.POSIXct(1549503263:1549503312, 'UTC'), '%T')
test(1985.3, as.ITime(char_time, use_lookup = TRUE),
as.ITime(char_time, use_lookup = FALSE))
test(1985.4, as.ITime(char_time, use_lookup = 'auto'),
as.ITime(char_time, use_lookup = FALSE))

###################################
# Add new tests above this line #
Expand Down
16 changes: 14 additions & 2 deletions man/IDateTime.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -48,14 +48,14 @@
}
\usage{
as.IDate(x, \dots)
\method{as.IDate}{default}(x, \dots, tz = attr(x, "tzone"))
\method{as.IDate}{default}(x, \dots, tz = attr(x, "tzone"), use_lookup = 'auto')
\method{as.IDate}{Date}(x, \dots)
\method{as.Date}{IDate}(x, \dots)
\method{as.POSIXct}{IDate}(x, tz = "UTC", time = 0, \dots)
\method{round}{IDate}(x, digits = c("weeks", "months", "quarters","years"), \ldots)

as.ITime(x, \dots)
\method{as.ITime}{default}(x, \dots)
\method{as.ITime}{default}(x, \dots, use_lookup = 'auto')
\method{as.ITime}{POSIXlt}(x, ms = 'truncate', \dots)

\method{as.POSIXct}{ITime}(x, tz = "UTC", date = Sys.Date(), \dots)
Expand Down Expand Up @@ -85,6 +85,7 @@ year(x)
\code{as.IDate.default}, arguments are passed to \code{as.Date}. For
\code{as.ITime.default}, arguments are passed to \code{as.POSIXlt}.}
\item{tz}{time zone (see \code{strptime}).}
\item{use_lookup}{ \code{logical}; should the parsing be done by via merge? See Details. }
\item{date}{date object convertable with \code{as.IDate}.}
\item{time}{time-of-day object convertable with \code{as.ITime}.}
\item{digits}{really \code{units}; one of the units listed for rounding. May be abbreviated.}
Expand All @@ -96,6 +97,17 @@ internal representation as the \code{Date} class, except the storage
mode is integer. \code{IDate} is a relatively simple wrapper, and it
should work in almost all situations as a replacement for \code{Date}.

The \code{use_merge} argument to \code{as.IDate.default} and \code{as.ITime.default} was implemented
based on the observation that the number of unique dates/times is
typically a small percentage of the number of total observations -- as such,
it is inefficient to repeatedly parse the same date/time potentially thousands or
millions of times. Instead, when \code{use_merge = TRUE}, conversion is done
using parsing on the set of unique dates/times, which is then merged to the complete
set of dates/times. By default, \code{use_merge = 'auto'}, in which case this method
will be applied when \code{length(x) >= 1000} since the overhead of merging
was found to exceed the efficiency gains of merging in small data sets. As a
rule, the performance improves with \code{1 - uniqueN(x)/length(x)}.

Functions that use \code{Date} objects generally work for
\code{IDate} objects. This package provides specific methods for
\code{IDate} objects for \code{mean}, \code{cut}, \code{seq}, \code{c},
Expand Down