Skip to content

Speedup DT[,.N,by=]. Currently evals j per group. #1251

@mattdowle

Description

@mattdowle
require(data.table)
DT = data.table(a=1:1e8, b=1:2)
DT[,.N,by=a,verbose=TRUE]
# Detected that j uses these columns: <none> 
# Finding groups (bysameorder=FALSE) ... done in 0.757secs. bysameorder=TRUE and o__ is length 0
# Optimization is on but left j unchanged (single plain symbol): '.N'
# Starting dogroups ... 
#   memcpy contiguous groups took 12.143s for 100000000 groups
#   eval(j) took 13.800s for 100000000 calls
# done dogroups in 55.253 secs
#                a N
# 1e+00:         1 1
# 2e+00:         2 1
# 3e+00:         3 1
# 4e+00:         4 1
# 5e+00:         5 1
#    ---            
# 1e+08:  99999996 1
# 1e+08:  99999997 1
# 1e+08:  99999998 1
# 1e+08:  99999999 1
# 1e+08: 100000000 1

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions