From d7e947b1dfc5ba998e647871fedccc339cce1802 Mon Sep 17 00:00:00 2001 From: shrektan Date: Fri, 4 Dec 2020 01:28:23 +0800 Subject: [PATCH 1/4] should set slen=1 when source is changed to scalar --- src/assign.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/assign.c b/src/assign.c index 5c0b808707..c3527003dd 100644 --- a/src/assign.c +++ b/src/assign.c @@ -682,7 +682,7 @@ const char *memrecycle(const SEXP target, const SEXP where, const int start, con // sourceLen==1 is used in dogroups to recycle the group values into ans to match the nrow of each group's result; sourceStart is set to each group value row. { if (len<1) return NULL; - const int slen = sourceLen>=0 ? sourceLen : length(source); + int slen = sourceLen>=0 ? sourceLen : length(source); // since source may get reassigned to a scalar, we should not mark it as const if (slen==0) return NULL; if (sourceStart<0 || sourceStart+slen>length(source)) error(_("Internal error memrecycle: sourceStart=%d sourceLen=%d length(source)=%d"), sourceStart, sourceLen, length(source)); // # nocov @@ -708,7 +708,7 @@ const char *memrecycle(const SEXP target, const SEXP where, const int start, con } else if (!sourceIsFactor && !isString(source)) { // target is factor if (allNA(source, false)) { // return false for list and other types that allNA does not support - source = ScalarLogical(NA_LOGICAL); // a global constant in R and won't allocate; fall through to regular zero-copy coerce + source = ScalarLogical(NA_LOGICAL); slen = 1; // a global constant in R and won't allocate; fall through to regular zero-copy coerce } else if (isInteger(source) || isReal(source)) { // allow assigning level numbers to factor columns; test 425, 426, 429 and 1945 const int nlevel = length(getAttrib(target, R_LevelsSymbol)); From bec47815733fd700cf32c511909629b94e1b2265 Mon Sep 17 00:00:00 2001 From: shrektan Date: Fri, 4 Dec 2020 01:35:31 +0800 Subject: [PATCH 2/4] add test --- inst/tests/tests.Rraw | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/inst/tests/tests.Rraw b/inst/tests/tests.Rraw index 9d207f8866..53217fec25 100644 --- a/inst/tests/tests.Rraw +++ b/inst/tests/tests.Rraw @@ -17235,3 +17235,8 @@ test(2160.2, x[, v0 := fcase( )][c(1,3,74,96,100), round(v0,1)], c(0, -24.7, 82.5, 6.7, 0)) rm(x) +# all-na-vector (not scalar) assigned to a factor column should be ok, #4824 +dt = data.table(FACTOR = factor(rep("a", 3L))) +set(dt, i = 1:2, j = "FACTOR", value = rep(NA, 2L)) +test(2161, dt$FACTOR, factor(c(NA, NA, "a"))) +rm(dt) From 3c67d285eb346fdbbe3065d2f244eaa5e29ae201 Mon Sep 17 00:00:00 2001 From: shrektan Date: Fri, 4 Dec 2020 01:44:43 +0800 Subject: [PATCH 3/4] add an NEWS entry --- NEWS.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/NEWS.md b/NEWS.md index afde983b46..3095ca796e 100644 --- a/NEWS.md +++ b/NEWS.md @@ -10,6 +10,8 @@ 1. `as.matrix()` now retains the column type for the empty matrix result, [#4762](https://github.com/Rdatatable/data.table/issues/4762). Thus, for example, `min(DT[0])` where DT's columns are numeric, is now consistent with non-empty all-NA input and returns `Inf` with R's warning `no non-missing arguments to min; returning Inf` rather than R's error `only defined on a data frame with all numeric[-alike] variables`. Thanks to @mb706 for reporting. +2. Fix an issue that assigning an all-NA-vector to a factor column in data.table may result in segfaults, [#4824](https://github.com/Rdatatable/data.table/issues/4824). Thanks to @clerousset for reporting and @shrektan for fixing. + ## NOTES 1. Continuous daily testing by CRAN using latest daily R-devel revealed, within one day of the change to R-devel, that a future version of R would break one of our tests, [#4769](https://github.com/Rdatatable/data.table/issues/4769). The characters "-alike" were added into one of R's error messages, so our too-strict test which expected the error `only defined on a data frame with all numeric variables` will fail when it sees the new error message `only defined on a data frame with all numeric-alike variables`. We have relaxed the pattern the test looks for to `data.*frame.*numeric` well in advance of the future version of R being released. Readers are reminded that CRAN is not just a host for packages. It is also a giant test suite for R-devel. For more information, [behind the scenes of cran, 2016](https://www.h2o.ai/blog/behind-the-scenes-of-cran/). From 3d7f52c10bd87460e73fd97420666c0e4b69052a Mon Sep 17 00:00:00 2001 From: Matt Dowle Date: Fri, 7 May 2021 01:32:30 -0600 Subject: [PATCH 4/4] moved news item up and tweaked --- NEWS.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/NEWS.md b/NEWS.md index 47d045cbb8..ad673a4417 100644 --- a/NEWS.md +++ b/NEWS.md @@ -76,6 +76,8 @@ 10. `X[Y, .SD, by=]` (joining and grouping in the same query) could segfault if i) `by=` is supplied custom data (i.e. not simple expressions of columns), and ii) some rows of `Y` do not match to any rows in `X`, [#4892](https://github.com/Rdatatable/data.table/issues/4892). Thanks to @Kodiologist for reporting, @ColeMiller1 for investigating, and @tlapak for the PR. +11. Assigning a set of 2 or more all-NA values to a factor column could segfault, [#4824](https://github.com/Rdatatable/data.table/issues/4824). Thanks to @clerousset for reporting and @shrektan for fixing. + ## NOTES 1. New feature 29 in v1.12.4 (Oct 2019) introduced zero-copy coercion. Our thinking is that requiring you to get the type right in the case of `0` (type double) vs `0L` (type integer) is too inconvenient for you the user. So such coercions happen in `data.table` automatically without warning. Thanks to zero-copy coercion there is no speed penalty, even when calling `set()` many times in a loop, so there's no speed penalty to warn you about either. However, we believe that assigning a character value such as `"2"` into an integer column is more likely to be a user mistake that you would like to be warned about. The type difference (character vs integer) may be the only clue that you have selected the wrong column, or typed the wrong variable to be assigned to that column. For this reason we view character to numeric-like coercion differently and will warn about it. If it is correct, then the warning is intended to nudge you to wrap the RHS with `as.()` so that it is clear to readers of your code that a coercion from character to that type is intended. For example : @@ -145,8 +147,6 @@ 3. Columns containing functions that don't inherit the class `'function'` would fail to group, [#4814](https://github.com/Rdatatable/data.table/issues/4814). Thanks @mb706 for reporting, @ecoRoland2 for helping investigate, and @Coorsaa for a follow-up example involving environments. -4. Fix an issue that assigning an all-NA-vector to a factor column in data.table may result in segfaults, [#4824](https://github.com/Rdatatable/data.table/issues/4824). Thanks to @clerousset for reporting and @shrektan for fixing. - ## NOTES 1. Continuous daily testing by CRAN using latest daily R-devel revealed, within one day of the change to R-devel, that a future version of R would break one of our tests, [#4769](https://github.com/Rdatatable/data.table/issues/4769). The characters "-alike" were added into one of R's error messages, so our too-strict test which expected the error `only defined on a data frame with all numeric variables` will fail when it sees the new error message `only defined on a data frame with all numeric-alike variables`. We have relaxed the pattern the test looks for to `data.*frame.*numeric` well in advance of the future version of R being released. Readers are reminded that CRAN is not just a host for packages. It is also a giant test suite for R-devel. For more information, [behind the scenes of cran, 2016](https://www.h2o.ai/blog/behind-the-scenes-of-cran/).