Skip to content

R 3.4.3 -> 3.5 performance drop #2962

@s-Nick-s

Description

@s-Nick-s

I've noticed a significant drop in data.table performance after upgrading to R 3.5.
For example:

set.seed(1)
types <- c("A", "B", "C", "D", "E", "F")
obs <- 4e7
dt1 <- data.table(percent = round(runif(obs, min = 0, max = 1), digits = 2),
                  type = as.factor(sample(types, obs, replace = TRUE)))
microbenchmark::microbenchmark(
  test1 <- one[, list(percent_total = sum(percent)), by = type]
, times = 30)

On 3.5 gives me:

Unit: milliseconds
                                                          expr    min      lq     mean   median       uq     max neval
 test1 <- dt1[, list(percent_total = sum(percent)), by = type] 522.53 539.668 576.2352 572.8301 602.0069 677.892    30
>sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)

Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.19.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C             
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.11.4

loaded via a namespace (and not attached):
[1] microbenchmark_1.4-4 compiler_3.5.0       tools_3.5.0  

But on 3.4.3 it's ~20% faster:

Unit: milliseconds
                                                          expr      min       lq     mean   median       uq      max neval
 test1 <- dt1[, list(percent_total = sum(percent)), by = type] 415.5938 445.8137 477.6562 470.0536 499.8974 576.9203    30
> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)

Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.19.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=C              LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.11.4

loaded via a namespace (and not attached):
[1] microbenchmark_1.4-4 compiler_3.4.3       tools_3.4.3   

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions