From 388beedbe78834d79fd61886d3b2417fb9178241 Mon Sep 17 00:00:00 2001 From: Benjamin Schwendinger <52290390+ben-schwen@users.noreply.github.com> Date: Sun, 12 Dec 2021 21:56:14 +0100 Subject: [PATCH 1/4] only assign copied value in := if not null --- NEWS.md | 2 ++ R/data.table.R | 2 +- inst/tests/tests.Rraw | 4 ++++ 3 files changed, 7 insertions(+), 1 deletion(-) diff --git a/NEWS.md b/NEWS.md index aad41a0ccb..023f67bb21 100644 --- a/NEWS.md +++ b/NEWS.md @@ -544,6 +544,8 @@ 51. `merge.data.table()` silently ignored the `incomparables` argument, [#2587](https://github.com/Rdatatable/data.table/issues/2587). It is now implemented and any other ignored arguments (e.g. misspellings) are now warned about. Thanks to @GBsuperman for the report and @ben-schwen for the fix. +52. assigning j with `:=` could lead to shorter j and recycling if j contained `NULL` values, [#5284](https://github.com/Rdatatable/data.table/issues/5284). Thanks @eutwt for the report, and @ben-schwen for the fix. + ## NOTES 1. New feature 29 in v1.12.4 (Oct 2019) introduced zero-copy coercion. Our thinking is that requiring you to get the type right in the case of `0` (type double) vs `0L` (type integer) is too inconvenient for you the user. So such coercions happen in `data.table` automatically without warning. Thanks to zero-copy coercion there is no speed penalty, even when calling `set()` many times in a loop, so there's no speed penalty to warn you about either. However, we believe that assigning a character value such as `"2"` into an integer column is more likely to be a user mistake that you would like to be warned about. The type difference (character vs integer) may be the only clue that you have selected the wrong column, or typed the wrong variable to be assigned to that column. For this reason we view character to numeric-like coercion differently and will warn about it. If it is correct, then the warning is intended to nudge you to wrap the RHS with `as.()` so that it is clear to readers of your code that a coercion from character to that type is intended. For example : diff --git a/R/data.table.R b/R/data.table.R index e671a208df..7f0f256a74 100644 --- a/R/data.table.R +++ b/R/data.table.R @@ -1377,7 +1377,7 @@ replace_dot_alias = function(e) { } else if (address(jval) == address(SDenv$.SD)) { jval = copy(jval) } else if ( length(jcpy <- which(vapply_1c(jval, address) %chin% vapply_1c(SDenv, address))) ) { - for (jidx in jcpy) jval[[jidx]] = copy(jval[[jidx]]) + for (jidx in jcpy) { if(!is.null(jval[[jidx]])) jval[[jidx]] = copy(jval[[jidx]]) } } else if (jsub %iscall% 'get') { jval = copy(jval) # fix for #1212 } diff --git a/inst/tests/tests.Rraw b/inst/tests/tests.Rraw index fc7e14f753..56bd155e70 100644 --- a/inst/tests/tests.Rraw +++ b/inst/tests/tests.Rraw @@ -18702,3 +18702,7 @@ test(2234.7, DT[, min(.SD), by=c(.I)], data.table(I=1L:5L, V1=c(1L, 2L, 3L, 2L, test(2234.8, DT[, min(.SD), by=.I%%2L], error="by.*contains .I.*supported") # would be nice to support in future; i.e. by odd/even rows, and by=(.I+1L)%/%2L for pairs of rows; i.e. any expression of .I test(2234.9, DT[, min(.SD), by=somefun(.I)], error="by.*contains .I.*supported") +# copying values of j could lead to recycling if j is a list containing NULL +DT = data.table(x = 1) +test(2235.1, copy(DT)[, c("z", "x") := {x = NULL; list(2, NULL)}], data.table(z = 2)) +test(2235.2, copy(DT)[, c("z", "x") := {list(2, NULL)}], data.table(z = 2)) From 78fd63905fe3d60b5d978fe017f7507dbecebc68 Mon Sep 17 00:00:00 2001 From: Benjamin Schwendinger <52290390+ben-schwen@users.noreply.github.com> Date: Sun, 19 Dec 2021 17:40:35 +0100 Subject: [PATCH 2/4] add issue number to tests --- inst/tests/tests.Rraw | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/inst/tests/tests.Rraw b/inst/tests/tests.Rraw index 56bd155e70..58e03c1af7 100644 --- a/inst/tests/tests.Rraw +++ b/inst/tests/tests.Rraw @@ -18702,7 +18702,7 @@ test(2234.7, DT[, min(.SD), by=c(.I)], data.table(I=1L:5L, V1=c(1L, 2L, 3L, 2L, test(2234.8, DT[, min(.SD), by=.I%%2L], error="by.*contains .I.*supported") # would be nice to support in future; i.e. by odd/even rows, and by=(.I+1L)%/%2L for pairs of rows; i.e. any expression of .I test(2234.9, DT[, min(.SD), by=somefun(.I)], error="by.*contains .I.*supported") -# copying values of j could lead to recycling if j is a list containing NULL +# copying values of j could lead to recycling if j is a list containing NULL #5284 DT = data.table(x = 1) test(2235.1, copy(DT)[, c("z", "x") := {x = NULL; list(2, NULL)}], data.table(z = 2)) test(2235.2, copy(DT)[, c("z", "x") := {list(2, NULL)}], data.table(z = 2)) From 420180c66d7f5338bac272c51970cf74a49eee7a Mon Sep 17 00:00:00 2001 From: mattdowle Date: Tue, 15 Mar 2022 06:37:39 -0600 Subject: [PATCH 3/4] code example to start news item with explanation --- NEWS.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/NEWS.md b/NEWS.md index bdad0f9f6d..6a759f7460 100644 --- a/NEWS.md +++ b/NEWS.md @@ -544,7 +544,7 @@ 51. `merge.data.table()` silently ignored the `incomparables` argument, [#2587](https://github.com/Rdatatable/data.table/issues/2587). It is now implemented and any other ignored arguments (e.g. misspellings) are now warned about. Thanks to @GBsuperman for the report and @ben-schwen for the fix. -52. assigning j with `:=` could lead to shorter j and recycling if j contained `NULL` values, [#5284](https://github.com/Rdatatable/data.table/issues/5284). Thanks @eutwt for the report, and @ben-schwen for the fix. +52. `DT[, c('z','x') := {x=NULL; list(2,NULL)}]` now removes column `x` as expected, [#5284](https://github.com/Rdatatable/data.table/issues/5284). The `x=NULL` is superfluous while the `list(2,NULL)` is the final value of `{}` whose items correspond to `c('z','x')`. Thanks @eutwt for the report, and @ben-schwen for the fix. ## NOTES From 47d54e0bc6160b016ebf90c9f47274d5970a9af9 Mon Sep 17 00:00:00 2001 From: mattdowle Date: Tue, 15 Mar 2022 06:41:34 -0600 Subject: [PATCH 4/4] news item more detail on what it did wrong before --- NEWS.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/NEWS.md b/NEWS.md index 6a759f7460..a3f2fe70cd 100644 --- a/NEWS.md +++ b/NEWS.md @@ -544,7 +544,7 @@ 51. `merge.data.table()` silently ignored the `incomparables` argument, [#2587](https://github.com/Rdatatable/data.table/issues/2587). It is now implemented and any other ignored arguments (e.g. misspellings) are now warned about. Thanks to @GBsuperman for the report and @ben-schwen for the fix. -52. `DT[, c('z','x') := {x=NULL; list(2,NULL)}]` now removes column `x` as expected, [#5284](https://github.com/Rdatatable/data.table/issues/5284). The `x=NULL` is superfluous while the `list(2,NULL)` is the final value of `{}` whose items correspond to `c('z','x')`. Thanks @eutwt for the report, and @ben-schwen for the fix. +52. `DT[, c('z','x') := {x=NULL; list(2,NULL)}]` now removes column `x` as expected rather than incorrectly assigning `2` to `x` as well as `z`, [#5284](https://github.com/Rdatatable/data.table/issues/5284). The `x=NULL` is superfluous while the `list(2,NULL)` is the final value of `{}` whose items correspond to `c('z','x')`. Thanks @eutwt for the report, and @ben-schwen for the fix. ## NOTES