diff --git a/NEWS.md b/NEWS.md index 1e165bd427..372dfcde64 100644 --- a/NEWS.md +++ b/NEWS.md @@ -22,6 +22,8 @@ 4. Passing `.SD` to `frankv()` with `ties.method='random'` or with `na.last=NA` failed with `.SD is locked`, [#4429](https://github.com/Rdatatable/data.table/issues/4429). Thanks @smarches for the report. +5. Filtering data.table using `which=NA` to return non-matching indices will now properly work for non-optimized subsetting as well, closes [#4411](https://github.com/Rdatatable/data.table/issues/4411). + ## NOTES 1. New feature 29 in v1.12.4 (Oct 2019) introduced zero-copy coercion. Our thinking is that requiring you to get the type right in the case of `0` (type double) vs `0L` (type integer) is too inconvenient for you the user. So such coercions happen in `data.table` automatically without warning. Thanks to zero-copy coercion there is no speed penalty, even when calling `set()` many times in a loop, so there's no speed penalty to warn you about either. However, we believe that assigning a character value such as `"2"` into an integer column is more likely to be a user mistake that you would like to be warned about. The type difference (character vs integer) may be the only clue that you have selected the wrong column, or typed the wrong variable to be assigned to that column. For this reason we view character to numeric-like coercion differently and will warn about it. If it is correct, then the warning is intended to nudge you to wrap the RHS with `as.()` so that it is clear to readers of your code that a coercion from character to that type is intended. For example : diff --git a/R/data.table.R b/R/data.table.R index bbc1cf5693..e806850e20 100644 --- a/R/data.table.R +++ b/R/data.table.R @@ -553,6 +553,11 @@ replace_dot_alias = function(e) { # i is not a data.table if (!is.logical(i) && !is.numeric(i)) stop("i has evaluated to type ", typeof(i), ". Expecting logical, integer or double.") if (is.logical(i)) { + if (is.na(which)) { # #4411 i filter not optimized to join: DT[A > 1, which = NA] + ## we need this branch here, not below next to which=TRUE because irows=i=which(i) will filter out NAs: DT[A > 10, which = NA] will be incorrect + if (notjoin) stop("internal error: notjoin and which=NA (non-matches), huh? please provide reproducible example to issue tracker") # nocov + return(which(is.na(i) | !i)) + } if (length(i)==1L # to avoid unname copy when length(i)==nrow (normal case we don't want to slow down) && isTRUE(unname(i))) { irows=i=NULL } # unname() for #2152 - length 1 named logical vector. # NULL is efficient signal to avoid creating 1:nrow(x) but still return all rows, fixes #1249 diff --git a/inst/tests/tests.Rraw b/inst/tests/tests.Rraw index 24b513875b..9ae4864fe2 100644 --- a/inst/tests/tests.Rraw +++ b/inst/tests/tests.Rraw @@ -17338,3 +17338,11 @@ test(2169.2, DT[ , frankv(.SD, ties.method='random')], 1:10) DT[, c('..na_prefix..', '..stats_runif..') := 1L] test(2169.3, DT[ , frankv(.SD, ties.method='average', na.last=NA)], error="Input column '..na_prefix..' conflicts") test(2169.4, DT[ , frankv(.SD, ties.method='random')], error="Input column '..stats_runif..' conflicts") + +# which=NA inconsistent with ?data.table, #4411 +DT = data.table(A = c(NA, 3, 5, 0, 1, 2), B = c("foo", "foo", "foo", "bar", "bar", "bar")) +test(2170.1, DT[A > 1, which = NA], c(1L,4:5)) +test(2170.2, DT[A > -1, which = NA], 1L) +test(2170.3, DT[A > -1 | is.na(A), which = NA], integer()) +test(2170.4, DT[A > 10, which = NA], seq_len(nrow(DT))) +test(2170.5, DT[!(A > 1), which = NA], c(1:3,6L)) # matches DT[A <= 1, which = NA]