Just found the following bug:
library(data.table)
temp1 <- data.table(x = factor("a"))
split(temp1, by = "x")
# Error: is.data.table(x) is not TRUE
# this works
temp2 <- data.table(.x = factor("a"))
split(temp2, by = ".x")
# this works as well
temp3 <- data.table(x = "a")
split(temp3, by = "x")
So the problem only emerges if the splitting variable is called 'x' and it is a factor. The problem can be rooted back to this temp = eval(dtq) call in split.data.table. In the dtq unevaluated call, make.levels(x, cols=.cols, sorted=.sorted) finds the 'x' variable instead of the 'x' data.table. A quick fix is to write make.levels(.___x, cols=.cols, sorted=.sorted) instead, and do a temporary assignment .___x <- x before temp = eval(dtq).
SessionInfo (the bug is also present in 1.11.8):
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Linux Mint 18.2
Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.11.9 switchr_0.13.0
loaded via a namespace (and not attached):
[1] compiler_3.5.1 tools_3.5.1 RCurl_1.95-4.11 remotes_2.0.2
[5] bitops_1.0-6
Just found the following bug:
So the problem only emerges if the splitting variable is called 'x' and it is a factor. The problem can be rooted back to this
temp = eval(dtq)call insplit.data.table. In thedtqunevaluated call,make.levels(x, cols=.cols, sorted=.sorted)finds the 'x' variable instead of the 'x' data.table. A quick fix is to writemake.levels(.___x, cols=.cols, sorted=.sorted)instead, and do a temporary assignment.___x <- xbeforetemp = eval(dtq).SessionInfo (the bug is also present in 1.11.8):