gprod produces wrong results for large integer64 and for prod(na.rm=TRUE)
library(bit64)
DT = data.table(x=c(lim.integer64(), 1, 1), g=1:2)
DT
#> x g
#> 1: -9223372036854775807 1
#> 2: 9223372036854775807 2
#> 3: 1 1
#> 4: 1 2
DT[, prod(x), g, verbose=TRUE]
#> Argument 'by' after substitute: g
#> Detected that j uses these columns: [x]
#> Finding groups using forderv ... forder.c received 4 rows and 1 columns
#> 0.000s elapsed (0.000s cpu)
#> Finding group sizes from the positions (can be avoided to save RAM) ... 0.000s elapsed (0.000s cpu)
#> Getting back original order ... forder.c received a vector type 'integer' length 2
#> 0.000s elapsed (0.000s cpu)
#> lapply optimization is on, j unchanged as 'prod(x)'
#> GForce optimized j to 'gprod(x)'
#> Making each group and running j (GForce TRUE) ... gforce initial population of grp took 0.000
#> gforce assign high and low took 0.001
#> gforce eval took 0.000
#> 0.000s elapsed (0.001s cpu)
#> g V1
#> 1: 1 <NA>
#> 2: 2 9221120237041092514
DT[, base::prod(x), g, verbose=TRUE]
#> Argument 'by' after substitute: g
#> Detected that j uses these columns: [x]
#> Finding groups using forderv ... forder.c received 4 rows and 1 columns
#> 0.000s elapsed (0.000s cpu)
#> Finding group sizes from the positions (can be avoided to save RAM) ... 0.000s elapsed (0.000s cpu)
#> Getting back original order ... forder.c received a vector type 'integer' length 2
#> 0.001s elapsed (0.000s cpu)
#> lapply optimization is on, j unchanged as 'base::prod(x)'
#> GForce is on, left j unchanged
#> Old mean optimization is on, left j unchanged.
#> Making each group and running j (GForce FALSE) ...
#> collecting discontiguous groups took 0.000s for 2 groups
#> eval(j) took 0.000s for 2 calls
#> 0.000s elapsed (0.000s cpu)
#> g V1
#> 1: 1 -9223372036854775807
#> 2: 2 9223372036854775807
edit1:
It also does not handle na.rm=TRUE correctly for integer64.
library(bit64)
DT = data.table(x=as.integer64(c(1:2, NA, NA)), g=1:2)
DT
#> x g
#> 1: 1 1
#> 2: 2 2
#> 3: <NA> 1
#> 4: <NA> 2
DT[, prod(x, na.rm=TRUE), g]
#> g V1
#> 1: 1 <NA>
#> 2: 2 <NA>
DT[, base::prod(x, na.rm=TRUE), g]
#> g V1
#> 1: 1 1
#> 2: 2 2
edit2:
Ok it doesn't even have to be "large" integer64
DT = data.table(x=as.integer64(c(2, -2, 1, 1)), g=1:2)
DT
#> x g
#> 1: 2 1
#> 2: -2 2
#> 3: 1 1
#> 4: 1 2
DT[, prod(x), g]
#> g V1
#> 1: 1 0
#> 2: 2 -2
DT[, base::prod(x), g]
#> g V1
#> 1: 1 2
#> 2: 2 -2
sessionInfo()
sessionInfo()
#> R version 4.1.1 (2021-08-10)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 20.04.3 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=de_AT.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=de_AT.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=de_AT.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=de_AT.UTF-8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] bit64_4.0.5 bit_4.0.4 data.table_1.14.3
#>
#> loaded via a namespace (and not attached):
#> [1] knitr_1.34 magrittr_2.0.1 rlang_0.4.11 fastmap_1.1.0
#> [5] fansi_0.5.0 stringr_1.4.0 styler_1.6.1 highr_0.9
#> [9] tools_4.1.1 xfun_0.26 utf8_1.2.2 withr_2.4.2
#> [13] htmltools_0.5.2 ellipsis_0.3.2 yaml_2.2.1 digest_0.6.27
#> [17] tibble_3.1.4 lifecycle_1.0.0 crayon_1.4.1 purrr_0.3.4
#> [21] vctrs_0.3.8 fs_1.5.0 glue_1.4.2 evaluate_0.14
#> [25] rmarkdown_2.11 reprex_2.0.1 stringi_1.7.4 compiler_4.1.1
#> [29] pillar_1.6.2 backports_1.2.1 pkgconfig_2.0.3
gprodproduces wrong results for largeinteger64and forprod(na.rm=TRUE)edit1:
It also does not handle
na.rm=TRUEcorrectly forinteger64.edit2:
Ok it doesn't even have to be "large"
integer64sessionInfo()