I've noticed a significant drop in data.table performance after upgrading to R 3.5.
For example:
set.seed(1)
types <- c("A", "B", "C", "D", "E", "F")
obs <- 4e7
dt1 <- data.table(percent = round(runif(obs, min = 0, max = 1), digits = 2),
type = as.factor(sample(types, obs, replace = TRUE)))
microbenchmark::microbenchmark(
test1 <- one[, list(percent_total = sum(percent)), by = type]
, times = 30)
On 3.5 gives me:
Unit: milliseconds
expr min lq mean median uq max neval
test1 <- dt1[, list(percent_total = sum(percent)), by = type] 522.53 539.668 576.2352 572.8301 602.0069 677.892 30
>sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)
Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.19.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=C
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.11.4
loaded via a namespace (and not attached):
[1] microbenchmark_1.4-4 compiler_3.5.0 tools_3.5.0
But on 3.4.3 it's ~20% faster:
Unit: milliseconds
expr min lq mean median uq max neval
test1 <- dt1[, list(percent_total = sum(percent)), by = type] 415.5938 445.8137 477.6562 470.0536 499.8974 576.9203 30
> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)
Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.19.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=C LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.11.4
loaded via a namespace (and not attached):
[1] microbenchmark_1.4-4 compiler_3.4.3 tools_3.4.3
I've noticed a significant drop in data.table performance after upgrading to R 3.5.
For example:
On 3.5 gives me:
But on 3.4.3 it's ~20% faster: