From 191501431b2f5308aefba8f43cf871a607232327 Mon Sep 17 00:00:00 2001 From: Michael Chirico Date: Thu, 3 Sep 2020 08:49:48 -0400 Subject: [PATCH 1/6] mirror fix to base R for as.IDate --- NEWS.md | 2 ++ R/IDateTime.R | 4 ++++ inst/tests/tests.Rraw | 4 ++++ 3 files changed, 10 insertions(+) diff --git a/NEWS.md b/NEWS.md index 62b0415c0a..405b17c338 100644 --- a/NEWS.md +++ b/NEWS.md @@ -12,6 +12,8 @@ 2. A regression in v1.13.0 resulted in installation on Mac often failing with `shared object 'datatable.so' not found`, and FreeBSD always failing with `expr: illegal option -- l`, [#4652](https://github.com/Rdatatable/data.table/issues/4652) [#4640](https://github.com/Rdatatable/data.table/issues/4640) [#4650](https://github.com/Rdatatable/data.table/issues/4650). Thanks to many for assistance including Simon Urbanek, Brian Ripley, Wes Morgan, and @ale07alvarez. There were no installation problems on Windows or Linux. +3. `as.IDate` works on character input where the first input is `''` by treating it as `NA`, [#4676](https://github.com/Rdatatable/data.table/issues/4676). This is consistent with how `''` is treated when it appears later in the input; the underlying inconsistency in `as.Date.character` was also fixed in `r-devel` revision 79119; we applied the change in `data.table` as well because we depend only on a much earlier version of R. + ## NOTES 1. `bit64` v4.0.2 and `bit` v4.0.3, both released on 30th July, broke `data.table`'s tests. It seems that reverse dependency testing of `bit64` (i.e. testing of the packages which use `bit64`) did not include `data.table` because `data.table` suggests `bit64` but does not depend on it. Like other packages on our `Suggest` list, we test `data.table` works with `bit64` in our tests. In testing of our own reverse dependencies (packages which use `data.table`) we do include packages which suggest `data.table`, although it appears it is not CRAN policy to do so. We have requested that CRAN policy be improved to include suggests in reverse dependency testing. diff --git a/R/IDateTime.R b/R/IDateTime.R index 0c0be82e83..c9c94d3421 100644 --- a/R/IDateTime.R +++ b/R/IDateTime.R @@ -7,6 +7,10 @@ as.IDate = function(x, ...) UseMethod("as.IDate") as.IDate.default = function(x, ..., tz = attr(x, "tzone", exact=TRUE)) { if (is.null(tz)) tz = "UTC" + # 4676 mimics for back-compatibility a similar patch applied to as.Date.character in r79119 + if (is.character(x)) { + is.na(x) = !nzchar(x) + } as.IDate(as.Date(x, tz = tz, ...)) } diff --git a/inst/tests/tests.Rraw b/inst/tests/tests.Rraw index c966a5efc6..782b38e0b3 100644 --- a/inst/tests/tests.Rraw +++ b/inst/tests/tests.Rraw @@ -17127,3 +17127,7 @@ test(2151, dcast(DT, 1 ~ a, value.var='survmean'), data.table('.'='.', s=1L, x=2 y = person(given='Joel', family='Mossong') test(2152, copy(y), y) +# initial '' doesn't crash as.IDate, #4676; +# also imitates base R as from 4.0.2 patched +y = c('', '2020-01-01') +test(2153, as.IDate(y), rev(as.IDate(rev(y)))) From 79438c40f89849a95ff4d4e70821a860ce83a4bb Mon Sep 17 00:00:00 2001 From: Michael Chirico Date: Sun, 9 May 2021 00:37:15 -0700 Subject: [PATCH 2/6] mention the released R version, not an SVN revision --- NEWS.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/NEWS.md b/NEWS.md index 9810aea26f..ab5a58477b 100644 --- a/NEWS.md +++ b/NEWS.md @@ -65,7 +65,7 @@ 9. `data.table(NULL)[, firstCol:=1L]` created `data.table(firstCol=1L)` ok but did not update the internal `row.names` attribute, causing `Error in '$<-.data.frame'(x, name, value) : replacement has 1 row, data has 0` when passed to packages like `ggplot` which use `DT` as if it is a `data.frame`, [#4597](https://github.com/Rdatatable/data.table/issues/4597). Thanks to Matthew Son for reporting, and Cole Miller for the PR. -10. `as.IDate` works on character input where the first input is `''` by treating it as `NA`, [#4676](https://github.com/Rdatatable/data.table/issues/4676). This is consistent with how `''` is treated when it appears later in the input; the underlying inconsistency in `as.Date.character` was also fixed in `r-devel` revision 79119; we applied the change in `data.table` as well because we depend only on a much earlier version of R. +10. `as.IDate` works on character input where the first input is `''` by treating it as `NA`, [#4676](https://github.com/Rdatatable/data.table/issues/4676). This is consistent with how `''` is treated when it appears later in the input; the underlying inconsistency in `as.Date.character` was fixed in R version 4.0.3; we applied the change in `data.table` as well because we depend only on a much earlier version of R. ## NOTES From 65e401e7c84106eabfa31a538c7ee483a0c3565f Mon Sep 17 00:00:00 2001 From: Matt Dowle Date: Wed, 26 May 2021 17:27:42 -0600 Subject: [PATCH 3/6] news item rework --- NEWS.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/NEWS.md b/NEWS.md index c7cccb33e5..11eb9c50be 100644 --- a/NEWS.md +++ b/NEWS.md @@ -131,7 +131,7 @@ 18. `as.data.table()` on `xts` objects containing a column named `x` would return an `index` of type plain `integer` rather than `POSIXct`, [#4897](https://github.com/Rdatatable/data.table/issues/4897). Thanks to Emil Sjørup for reporting, and Jan Gorecki for the PR. -19. `as.IDate` returns `NA` instead of erroring when the initial input is `''`, [#4676](https://github.com/Rdatatable/data.table/issues/4676). This is consistent with how `''` is treated when it appears later in the input. The same change was applied to `as.Date.character` in r79119 (R 4.0.3); we backported it here to continue supporting R 3.1.0. +19. A fix to `as.Date(c("", ...))` in R 4.0.3, [17909](https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=17909), has been backported to `data.table::as.IDate()` so that it too now returns `NA` for the first item when it is blank, even in older versions of R back to 3.1.0, rather than the incorrect error `character string is not in a standard unambiguous format`, [#4676](https://github.com/Rdatatable/data.table/issues/4676). Thanks to Arun Srinivasan for reporting, and Michaal Chirico both for the `data.table` PR and for submitting the patch to R that was accepted and included in R 4.0.3. ## NOTES From e1345bf17ca05a918408e3fd6ee9e790b0adfe44 Mon Sep 17 00:00:00 2001 From: Matt Dowle Date: Wed, 26 May 2021 17:40:52 -0600 Subject: [PATCH 4/6] fixed-value y's --- inst/tests/tests.Rraw | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/inst/tests/tests.Rraw b/inst/tests/tests.Rraw index 11c1cbe694..7b92fd9518 100644 --- a/inst/tests/tests.Rraw +++ b/inst/tests/tests.Rraw @@ -17705,8 +17705,7 @@ test(2190.9, DT[1:2, a:=call('sum', 1)], error="type 'language' cannot be coerce DT = data.table(i1 = c(234L, 250L, 169L, 234L, 147L, 96L, 96L, 369L, 147L, 96L), i4 = c(79L, 113L, 270L, -121L, 113L, 113L, -121L, 179L, -228L, 113L), v = 0) test(2191, DT[1:5, sum(v), by=.(i5 = 1:5 %% 2L), verbose=TRUE], data.table(i5=1:0, V1=c(0,0)), output="gforce") -# initial '' doesn't crash as.IDate, #4676; also imitates base R as from 4.0.2 patched -y = c('', '2020-01-01') -## consistency check -- y parses the same in either order -test(2192.1, as.IDate(y), rev(as.IDate(rev(y)))) -test(2192.2, as.IDate(y), as.IDate(c(NA_integer_, 18262L))) +# as.IDate was error when first item blank, #4676 +test(2192.1, as.IDate(c('', '2020-01-01')), structure(c(NA_integer_, 18262L), class=c("IDate","Date"))) +test(2192.2, as.IDate(c('2020-01-01', '')), structure(c(18262L, NA_integer_), class=c("IDate","Date"))) + From c14bb8736beb73c2f78b84ba38c4dd60eadf0473 Mon Sep 17 00:00:00 2001 From: Matt Dowle Date: Wed, 26 May 2021 17:45:10 -0600 Subject: [PATCH 5/6] comment tweak --- inst/tests/tests.Rraw | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/inst/tests/tests.Rraw b/inst/tests/tests.Rraw index 7b92fd9518..218de465da 100644 --- a/inst/tests/tests.Rraw +++ b/inst/tests/tests.Rraw @@ -17705,7 +17705,7 @@ test(2190.9, DT[1:2, a:=call('sum', 1)], error="type 'language' cannot be coerce DT = data.table(i1 = c(234L, 250L, 169L, 234L, 147L, 96L, 96L, 369L, 147L, 96L), i4 = c(79L, 113L, 270L, -121L, 113L, 113L, -121L, 179L, -228L, 113L), v = 0) test(2191, DT[1:5, sum(v), by=.(i5 = 1:5 %% 2L), verbose=TRUE], data.table(i5=1:0, V1=c(0,0)), output="gforce") -# as.IDate was error when first item blank, #4676 +# base::as.Date was error when first item blank, affecting as.IDate, #4676 test(2192.1, as.IDate(c('', '2020-01-01')), structure(c(NA_integer_, 18262L), class=c("IDate","Date"))) test(2192.2, as.IDate(c('2020-01-01', '')), structure(c(18262L, NA_integer_), class=c("IDate","Date"))) From 96f5bb6883a72293a02b38b11d0a10cd8911e517 Mon Sep 17 00:00:00 2001 From: Matt Dowle Date: Wed, 26 May 2021 18:02:34 -0600 Subject: [PATCH 6/6] typo news item --- NEWS.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/NEWS.md b/NEWS.md index 11eb9c50be..4e108fae47 100644 --- a/NEWS.md +++ b/NEWS.md @@ -43,7 +43,7 @@ Item 2 has 0 rows but longest item has 3; filled with NA ``` -5. `%like%` on factors with a large number of levels is now faster, [#4748](https://github.com/Rdatatable/data.table/issues/4748). The example in the PR shows 2.37s reduced to 0.86s on a factor lengh 100 million containing 1 million unique 10-character strings. Thanks to @statquant for reporting, and @shrektan for implementing. +5. `%like%` on factors with a large number of levels is now faster, [#4748](https://github.com/Rdatatable/data.table/issues/4748). The example in the PR shows 2.37s reduced to 0.86s on a factor length 100 million containing 1 million unique 10-character strings. Thanks to @statquant for reporting, and @shrektan for implementing. 6. `keyby=` now accepts `TRUE`/`FALSE` together with `by=`, [#4307](https://github.com/Rdatatable/data.table/issues/4307). The primary motivation is benchmarking where `by=` vs `keyby=` is varied across a set of queries. Thanks to Jan Gorecki for the request and the PR. @@ -131,7 +131,7 @@ 18. `as.data.table()` on `xts` objects containing a column named `x` would return an `index` of type plain `integer` rather than `POSIXct`, [#4897](https://github.com/Rdatatable/data.table/issues/4897). Thanks to Emil Sjørup for reporting, and Jan Gorecki for the PR. -19. A fix to `as.Date(c("", ...))` in R 4.0.3, [17909](https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=17909), has been backported to `data.table::as.IDate()` so that it too now returns `NA` for the first item when it is blank, even in older versions of R back to 3.1.0, rather than the incorrect error `character string is not in a standard unambiguous format`, [#4676](https://github.com/Rdatatable/data.table/issues/4676). Thanks to Arun Srinivasan for reporting, and Michaal Chirico both for the `data.table` PR and for submitting the patch to R that was accepted and included in R 4.0.3. +19. A fix to `as.Date(c("", ...))` in R 4.0.3, [17909](https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=17909), has been backported to `data.table::as.IDate()` so that it too now returns `NA` for the first item when it is blank, even in older versions of R back to 3.1.0, rather than the incorrect error `character string is not in a standard unambiguous format`, [#4676](https://github.com/Rdatatable/data.table/issues/4676). Thanks to Arun Srinivasan for reporting, and Michael Chirico both for the `data.table` PR and for submitting the patch to R that was accepted and included in R 4.0.3. ## NOTES