Skip to content

When using ".SDcols" and "by" in one call, and the function call produces NA for one group, but not the other, you get an error #5341

@CGlemser

Description

@CGlemser

I have run into the following situation (simplified for reproducibility):

  • I created a function that returns the sum of a vector with all NAs removed and NA if all elements are NA
    sumNA <- function(x) ifelse(all(is.na(x)), NA, sum(x, na.rm = TRUE))
  • then I want to apply it in a data.table to only certain columns grouped by another column
    dt <- data.table(col1 = c(rep(NA, 2), 1, 2), col2 = rep(c(1,2), each = 2), groupby = c(1,1,2,2))
    dt[, lapply(.SD, sumNA), .SDcols = c("col1", "col2"), by = "groupby"]
  • this returns an error since col1 is NA for all elements in group1, but has a sum value of 3 for the elements in group2. As a result data.table can't produce the column as the NA is coded as logical, but the result for group2 is double.
  • I have fixed this in my case by explicitly coercing the NA to be of the correct class in my function before returning the result... However, having this NA coercion happen in data.table directly instead would have been a very nice-to-have and could potentially save many users from having to code the same workaround in every function that returns NAs and takes several input classes
  • it also took me a while to figure out what caused the error message, so in case implementing the NA class coercion is not easily feasible, I also think that a more descriptive error message could be helpful

Output of sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 LC_MONETARY=German_Germany.1252
[4] LC_NUMERIC=C LC_TIME=German_Germany.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

loaded via a namespace (and not attached):
[1] compiler_4.1.1 cli_3.1.0 tools_4.1.1 data.table_1.14.2 rlang_1.0.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions