While running grouping benchmark on 2e9 rows dataset (96GB csv) using recent stable data.table 1.13.2 I am getting following exception:
> system.time( DT[, sum(v1), keyby=id1] )
Error in gforce(thisEnv, jsub, o__, f__, len__, irows) :
Internal error: Failed to allocate counts or TMP when assigning g in gforce
Calls: system.time -> [ -> [.data.table -> gforce
|
if (!counts || !TMP ) error(_("Internal error: Failed to allocate counts or TMP when assigning g in gforce")); |
It is the same machine as the one used in 2014: 32 cores and 244GB memory.
I run data.table 1.9.2 as well to ensure that version which previously worked fine for this data size continue to work on a recent R version.
> system.time( DT[, sum(v1), keyby=id1] )
user system elapsed
58.113 17.098 75.219
> system.time( DT[, sum(v1), keyby=id1] )
user system elapsed
59.185 15.303 74.496
> system.time( DT[, sum(v1), keyby="id1,id2"] )
user system elapsed
180.160 19.953 200.137
> system.time( DT[, sum(v1), keyby="id1,id2"] )
user system elapsed
204.208 39.651 243.889
> system.time( DT[, list(sum(v1),mean(v3)), keyby=id3] )
user system elapsed
1037.451 51.269 1088.853
> system.time( DT[, list(sum(v1),mean(v3)), keyby=id3] )
user system elapsed
1023.068 29.556 1052.753
> system.time( DT[, lapply(.SD, mean), keyby=id4, .SDcols=7:9] )
user system elapsed
73.123 18.026 91.160
> system.time( DT[, lapply(.SD, mean), keyby=id4, .SDcols=7:9] )
user system elapsed
70.523 8.951 79.483
> system.time( DT[, lapply(.SD, sum), keyby=id6, .SDcols=7:9] )
user system elapsed
489.294 36.192 525.548
> system.time( DT[, lapply(.SD, sum), keyby=id6, .SDcols=7:9] )
user system elapsed
488.316 28.808 517.188
Timings are slower than they were in the past, but AFAIK this is what we observed in other issues: newer version of R was introducing an overhead that data.table was later addressing in newer versions. So if users upgrade R, then they should also upgrade data.table.
While running grouping benchmark on 2e9 rows dataset (96GB csv) using recent stable data.table 1.13.2 I am getting following exception:
data.table/src/gsumm.c
Line 116 in 8480b6a
It is the same machine as the one used in 2014: 32 cores and 244GB memory.
I run data.table 1.9.2 as well to ensure that version which previously worked fine for this data size continue to work on a recent R version.
Timings are slower than they were in the past, but AFAIK this is what we observed in other issues: newer version of R was introducing an overhead that data.table was later addressing in newer versions. So if users upgrade R, then they should also upgrade data.table.