From 39dd122812eafca065d37a16fa0702bd1110be8b Mon Sep 17 00:00:00 2001 From: jangorecki Date: Wed, 6 May 2020 02:17:00 +0100 Subject: [PATCH] which=NA for not optimized i, closes #4411 --- NEWS.md | 2 ++ R/data.table.R | 5 +++++ inst/tests/tests.Rraw | 8 ++++++++ 3 files changed, 15 insertions(+) diff --git a/NEWS.md b/NEWS.md index 026f06c86e..97c349c54d 100644 --- a/NEWS.md +++ b/NEWS.md @@ -109,6 +109,8 @@ unit = "s") 13. A relatively rare case of segfault when combining non-equi joins with `by=.EACHI` is now fixed, closes [#4388](https://github.com/Rdatatable/data.table/issues/4388). +14. Filtering data.table using `which=NA` to return non-matching indices will now properly work for non-optimized subsetting as well, closes [#4411](https://github.com/Rdatatable/data.table/issues/4411). + ## NOTES 0. Retrospective license change permission was sought from and granted by 4 contributors who were missed in [PR#2456](https://github.com/Rdatatable/data.table/pull/2456), [#4140](https://github.com/Rdatatable/data.table/pull/4140). We had used [GitHub's contributor page](https://github.com/Rdatatable/data.table/graphs/contributors) which omits 3 of these due to invalid email addresses, unlike GitLab's contributor page which includes the ids. The 4th omission was a PR to a script which should not have been excluded; a script is code too. We are sorry these contributors were not properly credited before. They have now been added to the contributors list as displayed on CRAN. All the contributors of code to data.table hold its copyright jointly; your contributions belong to you. You contributed to data.table when it had a particular license at that time, and you contributed on that basis. This is why in the last license change, all contributors of code were consulted and each had a veto. diff --git a/R/data.table.R b/R/data.table.R index 98651d6e0b..dac973d81f 100644 --- a/R/data.table.R +++ b/R/data.table.R @@ -549,6 +549,11 @@ replace_dot_alias = function(e) { # i is not a data.table if (!is.logical(i) && !is.numeric(i)) stop("i has evaluated to type ", typeof(i), ". Expecting logical, integer or double.") if (is.logical(i)) { + if (is.na(which)) { # #4411 i filter not optimized to join: DT[A > 1, which = NA] + ## we need this branch here, not below next to which=TRUE because irows=i=which(i) will filter out NAs: DT[A > 10, which = NA] will be incorrect + if (notjoin) stop("internal error: notjoin and which=NA (non-matches), huh? please provide reproducible example to issue tracker") # nocov + return(which(is.na(i) | !i)) + } if (length(i)==1L # to avoid unname copy when length(i)==nrow (normal case we don't want to slow down) && isTRUE(unname(i))) { irows=i=NULL } # unname() for #2152 - length 1 named logical vector. # NULL is efficient signal to avoid creating 1:nrow(x) but still return all rows, fixes #1249 diff --git a/inst/tests/tests.Rraw b/inst/tests/tests.Rraw index ed17470383..4e17da84ec 100644 --- a/inst/tests/tests.Rraw +++ b/inst/tests/tests.Rraw @@ -16853,3 +16853,11 @@ A = data.table(A=c(complex(real = 1:3, imaginary=c(0, -1, 1)), NaN)) test(2138.3, rbind(A,B), data.table(A=c(as.character(A$A), B$A))) A = data.table(A=as.complex(rep(NA, 5))) test(2138.4, rbind(A,B), data.table(A=c(as.character(A$A), B$A))) + +# which=NA inconsistent with ?data.table #4411 +DT = data.table(A = c(NA, 3, 5, 0, 1, 2), B = c("foo", "foo", "foo", "bar", "bar", "bar")) +test(2139.1, DT[A > 1, which = NA], c(1L,4:5)) +test(2139.2, DT[A > -1, which = NA], 1L) +test(2139.3, DT[A > -1 | is.na(A), which = NA], integer()) +test(2139.4, DT[A > 10, which = NA], seq_len(nrow(DT))) +test(2139.5, DT[!(A > 1), which = NA], c(1:3,6L)) # matches DT[A <= 1, which = NA]