Skip to content

gprod 64bit produces wrong results #5225

@ben-schwen

Description

@ben-schwen

gprod produces wrong results for large integer64 and for prod(na.rm=TRUE)

library(bit64)
DT = data.table(x=c(lim.integer64(), 1, 1), g=1:2)
DT
#>                       x g
#> 1: -9223372036854775807 1
#> 2:  9223372036854775807 2
#> 3:                    1 1
#> 4:                    1 2
DT[, prod(x), g, verbose=TRUE]
#> Argument 'by' after substitute: g
#> Detected that j uses these columns: [x]
#> Finding groups using forderv ... forder.c received 4 rows and 1 columns
#> 0.000s elapsed (0.000s cpu) 
#> Finding group sizes from the positions (can be avoided to save RAM) ... 0.000s elapsed (0.000s cpu) 
#> Getting back original order ... forder.c received a vector type 'integer' length 2
#> 0.000s elapsed (0.000s cpu) 
#> lapply optimization is on, j unchanged as 'prod(x)'
#> GForce optimized j to 'gprod(x)'
#> Making each group and running j (GForce TRUE) ... gforce initial population of grp took 0.000
#> gforce assign high and low took 0.001
#> gforce eval took 0.000
#> 0.000s elapsed (0.001s cpu)
#>    g                  V1
#> 1: 1                <NA>
#> 2: 2 9221120237041092514
DT[, base::prod(x), g, verbose=TRUE]
#> Argument 'by' after substitute: g
#> Detected that j uses these columns: [x]
#> Finding groups using forderv ... forder.c received 4 rows and 1 columns
#> 0.000s elapsed (0.000s cpu) 
#> Finding group sizes from the positions (can be avoided to save RAM) ... 0.000s elapsed (0.000s cpu) 
#> Getting back original order ... forder.c received a vector type 'integer' length 2
#> 0.001s elapsed (0.000s cpu) 
#> lapply optimization is on, j unchanged as 'base::prod(x)'
#> GForce is on, left j unchanged
#> Old mean optimization is on, left j unchanged.
#> Making each group and running j (GForce FALSE) ... 
#>   collecting discontiguous groups took 0.000s for 2 groups
#>   eval(j) took 0.000s for 2 calls
#> 0.000s elapsed (0.000s cpu)
#>    g                   V1
#> 1: 1 -9223372036854775807
#> 2: 2  9223372036854775807

edit1:
It also does not handle na.rm=TRUE correctly for integer64.

library(bit64)
DT = data.table(x=as.integer64(c(1:2, NA, NA)), g=1:2)
DT
#>       x g
#> 1:    1 1
#> 2:    2 2
#> 3: <NA> 1
#> 4: <NA> 2
DT[, prod(x, na.rm=TRUE), g]
#>    g   V1
#> 1: 1 <NA>
#> 2: 2 <NA>
DT[, base::prod(x, na.rm=TRUE), g]
#>    g V1
#> 1: 1  1
#> 2: 2  2

edit2:
Ok it doesn't even have to be "large" integer64

DT = data.table(x=as.integer64(c(2, -2, 1, 1)), g=1:2)
DT
#>     x g
#> 1:  2 1
#> 2: -2 2
#> 3:  1 1
#> 4:  1 2
DT[, prod(x), g]
#>    g V1
#> 1: 1  0
#> 2: 2 -2
DT[, base::prod(x), g]
#>    g V1
#> 1: 1  2
#> 2: 2 -2
sessionInfo()
sessionInfo()
#> R version 4.1.1 (2021-08-10)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 20.04.3 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=de_AT.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=de_AT.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=de_AT.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=de_AT.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] bit64_4.0.5       bit_4.0.4         data.table_1.14.3
#> 
#> loaded via a namespace (and not attached):
#>  [1] knitr_1.34      magrittr_2.0.1  rlang_0.4.11    fastmap_1.1.0  
#>  [5] fansi_0.5.0     stringr_1.4.0   styler_1.6.1    highr_0.9      
#>  [9] tools_4.1.1     xfun_0.26       utf8_1.2.2      withr_2.4.2    
#> [13] htmltools_0.5.2 ellipsis_0.3.2  yaml_2.2.1      digest_0.6.27  
#> [17] tibble_3.1.4    lifecycle_1.0.0 crayon_1.4.1    purrr_0.3.4    
#> [21] vctrs_0.3.8     fs_1.5.0        glue_1.4.2      evaluate_0.14  
#> [25] rmarkdown_2.11  reprex_2.0.1    stringi_1.7.4   compiler_4.1.1 
#> [29] pillar_1.6.2    backports_1.2.1 pkgconfig_2.0.3

Metadata

Metadata

Assignees

Labels

GForceissues relating to optimized grouping calculations (GForce)bit64

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions