This bug is related to #495.
require(data.table)
DT = data.table(x=c(1,1,1,2,2), y=1:5)
z = 1:5
options(datatable.verbose=TRUE)
# correct answer (?)
options(datatable.optimize = Inf)
DT[, list(mean(z), mean(y)), by=x]
# GForce optimized j to 'list(gmean(z), gmean(y))'
# x V1 V2
#1: 1 2.0 2.0
#2: 2 4.5 4.5
# incorrect answer (?)
options(datatable.optimize = 1L) # no GForce
DT[, list(mean(z), mean(y)), by=x]
# x V1 V2
#1: 1 3 2.0
#2: 2 3 4.5
Basically mean is computed on entire z in the second case (where mean gets optimised to fastmean internally). This is most likely because .SD doesn't have this variable in it. So it comes back to #495.
For the same reason, say calculating variance or standard deviation won't work, even if optimise value if Inf (because GForce isn't implemented for those functions).
options(datatable.optimize=Inf)
DT[, list(sd(z), sd(y)), by=x]
# x V1 V2
#1: 1 1.581139 1.0000000
#2: 2 1.581139 0.7071068
For now, using external variables for grouping has a bug. This observation came from this SO post. Thanks to drstevok.
This bug is related to #495.
Basically
meanis computed on entirezin the second case (wheremeangets optimised tofastmeaninternally). This is most likely because.SDdoesn't have this variable in it. So it comes back to #495.For the same reason, say calculating variance or standard deviation won't work, even if optimise value if
Inf(because GForce isn't implemented for those functions).For now, using external variables for grouping has a bug. This observation came from this SO post. Thanks to drstevok.