I have run into the following situation (simplified for reproducibility):
- I created a function that returns the sum of a vector with all NAs removed and NA if all elements are NA
sumNA <- function(x) ifelse(all(is.na(x)), NA, sum(x, na.rm = TRUE))
- then I want to apply it in a data.table to only certain columns grouped by another column
dt <- data.table(col1 = c(rep(NA, 2), 1, 2), col2 = rep(c(1,2), each = 2), groupby = c(1,1,2,2))
dt[, lapply(.SD, sumNA), .SDcols = c("col1", "col2"), by = "groupby"]
- this returns an error since col1 is NA for all elements in group1, but has a sum value of 3 for the elements in group2. As a result data.table can't produce the column as the NA is coded as logical, but the result for group2 is double.
- I have fixed this in my case by explicitly coercing the NA to be of the correct class in my function before returning the result... However, having this NA coercion happen in data.table directly instead would have been a very nice-to-have and could potentially save many users from having to code the same workaround in every function that returns NAs and takes several input classes
- it also took me a while to figure out what caused the error message, so in case implementing the NA class coercion is not easily feasible, I also think that a more descriptive error message could be helpful
Output of sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
Matrix products: default
locale:
[1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 LC_MONETARY=German_Germany.1252
[4] LC_NUMERIC=C LC_TIME=German_Germany.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_4.1.1 cli_3.1.0 tools_4.1.1 data.table_1.14.2 rlang_1.0.
I have run into the following situation (simplified for reproducibility):
sumNA <- function(x) ifelse(all(is.na(x)), NA, sum(x, na.rm = TRUE))dt <- data.table(col1 = c(rep(NA, 2), 1, 2), col2 = rep(c(1,2), each = 2), groupby = c(1,1,2,2))dt[, lapply(.SD, sumNA), .SDcols = c("col1", "col2"), by = "groupby"]Output of sessionInfo()R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)
Matrix products: default
locale:
[1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 LC_MONETARY=German_Germany.1252
[4] LC_NUMERIC=C LC_TIME=German_Germany.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_4.1.1 cli_3.1.0 tools_4.1.1 data.table_1.14.2 rlang_1.0.