Timings below may look like obtained from single session but they were actually run in fresh session each, also in between there was sudo sh -c 'echo 3 >/proc/sys/vm/drop_caches'.
library(data.table) ## 1.13.5
setDTthreads(0L) ## 40
set.seed(108)
N = 1e9L
K = 1e2L
DT = list()
DT[["id3"]] = factor(sample(sprintf("id%010d",1:(N/K)), N, TRUE))
DT[["v3"]] = round(runif(N,max=100),6)
setDT(DT)
system.time(naf <- DT[, .(v3=mean(v3)), by=id3, verbose=TRUE])
#Detected that j uses these columns: v3
#Finding groups using forderv ... forder.c received 1000000000 rows and 1 columns
#5.615s elapsed (00:01:39 cpu)
#Finding group sizes from the positions (can be avoided to save RAM) ... 0.091s elapsed (0.074s cpu)
#Getting back original order ... forder.c received a vector type 'integer' length 10000000
#1.037s elapsed (2.888s cpu)
#lapply optimization is on, j unchanged as 'list(mean(v3))'
#GForce optimized j to 'list(gmean(v3))'
#Making each group and running j (GForce TRUE) ... gforce initial population of grp took 0.319
#gforce assign high and low took 4.399
#This gsum took (narm=FALSE) ... gather took ... 2.107s
#2.322s
#gforce eval took 2.339
#8.738s elapsed (00:02:39 cpu)
#
# user system elapsed
#261.852 67.723 15.498
system.time(nat <- DT[, .(v3=mean(v3, na.rm=TRUE)), by=id3, verbose=TRUE])
#Detected that j uses these columns: v3
#Finding groups using forderv ... forder.c received 1000000000 rows and 1 columns
#5.799s elapsed (00:01:42 cpu)
#Finding group sizes from the positions (can be avoided to save RAM) ... 0.090s elapsed (0.074s cpu)
#Getting back original order ... forder.c received a vector type 'integer' length 10000000
#2.608s elapsed (3.275s cpu)
#lapply optimization is on, j unchanged as 'list(mean(v3, na.rm = TRUE))'
#GForce optimized j to 'list(gmean(v3, na.rm = TRUE))'
#Making each group and running j (GForce TRUE) ... gforce initial population of grp took 0.346
#gforce assign high and low took 4.978
#gforce eval took 33.515
#40.2s elapsed (00:02:24 cpu)
#
# user system elapsed
#250.858 68.804 48.679
This is actually mentioned in #3202.
Timings below may look like obtained from single session but they were actually run in fresh session each, also in between there was
sudo sh -c 'echo 3 >/proc/sys/vm/drop_caches'.This is actually mentioned in #3202.