Rdatatable · mattdowle · Nov 16, 2021 · Oct 19, 2021 · Oct 21, 2021 · Oct 21, 2021
@@ -448,9 +448,11 @@
     # 2:          2021-02-03  # was 18661
     # 3: 4611686018427387906  # was error 'please use as.character'
     ```
-    
+
 47. `tables()` failed with `argument "..." is missing` when called from within a function taking `...`; e.g. `function(...) { tables() }`, [#5197](https://github.com/Rdatatable/data.table/issues/5197). Thanks @greg-minshall for the report and @michaelchirico for the fix.
 
+48. `DT[, prod(int64Col), by=grp]` produced wrong results for `bit64::integer64` due to incorrect optimization, [#5225](https://github.com/Rdatatable/data.table/issues/5225). Thanks to Benjamin Schwendinger for reporting and fixing.
+
 ## NOTES
 
 1. New feature 29 in v1.12.4 (Oct 2019) introduced zero-copy coercion. Our thinking is that requiring you to get the type right in the case of `0` (type double) vs `0L` (type integer) is too inconvenient for you the user. So such coercions happen in `data.table` automatically without warning. Thanks to zero-copy coercion there is no speed penalty, even when calling `set()` many times in a loop, so there's no speed penalty to warn you about either. However, we believe that assigning a character value such as `"2"` into an integer column is more likely to be a user mistake that you would like to be warned about. The type difference (character vs integer) may be the only clue that you have selected the wrong column, or typed the wrong variable to be assigned to that column. For this reason we view character to numeric-like coercion differently and will warn about it. If it is correct, then the warning is intended to nudge you to wrap the RHS with `as.<type>()` so that it is clear to readers of your code that a coercion from character to that type is intended. For example :

@@ -18348,3 +18348,11 @@ test(2225.1, groupingsets(data.table(iris), j=sum(Sepal.Length), by=c('Sp'='Spec
 test(2225.2, groupingsets(data.table(iris), j=mean(Sepal.Length), by=c('Sp'='Species'), sets=list('Species')),
              groupingsets(data.table(iris), j=mean(Sepal.Length), by=c('Species'), sets=list('Species')))
 
+# make gprod work for bit64, #5225
+if (test_bit64) {
+  test(2226.1, base::prod(2147483647L,2L), 4294967294)  # just to illustrate that base returns double
+  DT = data.table(x=c(lim.integer64(), 2, 1, NA, NA, -2, 4), g=INT(1,2,1,2,1,2,3,3))
+  test(2226.2, DT[, prod(x), g],            data.table(g=1:3, V1=as.integer64(c(NA,NA,-8L))))
+  test(2226.3, DT[, prod(x,na.rm=TRUE), g], data.table(g=1:3, V1=as.integer64(c(NA,"9223372036854775807",-8L))))
+}
+
@@ -1114,13 +1114,10 @@ SEXP gprod(SEXP x, SEXP narmArg) {
   const bool nosubset = irowslen==-1;
   const int n = nosubset ? length(x) : irowslen;
   //clock_t start = clock();
-  SEXP ans;
   if (nrow != n) error(_("nrow [%d] != length(x) [%d] in %s"), nrow, n, "gprod");
   long double *s = malloc(ngrp * sizeof(long double));
   if (!s) error(_("Unable to allocate %d * %d bytes for gprod"), ngrp, sizeof(long double));
   for (int i=0; i<ngrp; ++i) s[i] = 1.0;
-  ans = PROTECT(allocVector(REALSXP, ngrp));
-  double *ansd = REAL(ans);
   switch(TYPEOF(x)) {
   case LGLSXP: case INTSXP: {
     const int *xd = INTEGER(x);
@@ -1135,31 +1132,53 @@ SEXP gprod(SEXP x, SEXP narmArg) {
     }}
     break;
   case REALSXP: {
-    const double *xd = REAL(x);
-    for (int i=0; i<n; ++i) {
-      const int thisgrp = grp[i];
-      const double elem = nosubset ? xd[i] : (irows[i]==NA_INTEGER ? NA_REAL : xd[irows[i]-1]);
-      if (ISNAN(elem)) {
-        if (!narm) s[thisgrp] = NA_REAL;
-        continue;
+    if (INHERITS(x, char_integer64)) {
+      const int64_t *xd = (const int64_t *)REAL(x);
+      for (int i=0; i<n; ++i) {
+        const int thisgrp = grp[i];
+        const int64_t elem = nosubset ? xd[i] : (irows[i]==NA_INTEGER ? NA_INTEGER64 : xd[irows[i]-1]);
+        if (elem==NA_INTEGER64) {
+          if (!narm) s[thisgrp] = NA_REAL;
+          continue;
+        }
+        s[thisgrp] *= elem;
       }
-      s[thisgrp] *= elem;
-    }}
-    break;
+    } else {
+      const double *xd = REAL(x);
+      for (int i=0; i<n; ++i) {
+        const int thisgrp = grp[i];
+        const double elem = nosubset ? xd[i] : (irows[i]==NA_INTEGER ? NA_REAL : xd[irows[i]-1]);
+        if (ISNAN(elem)) {
+          if (!narm) s[thisgrp] = NA_REAL;
+          continue;
+        }
+        s[thisgrp] *= elem;
+      }
+    }
+  } break;
   default:
     free(s);
     error(_("Type '%s' is not supported by GForce %s. Either add the prefix %s or turn off GForce optimization using options(datatable.optimize=1)"), type2char(TYPEOF(x)), "prod (gprod)", "base::prod(.)");
   }
-  for (int i=0; i<ngrp; ++i) {
-    if (s[i] > DBL_MAX) ansd[i] = R_PosInf;
-    else if (s[i] < -DBL_MAX) ansd[i] = R_NegInf;
-    else ansd[i] = (double)s[i];
+  SEXP ans = PROTECT(allocVector(REALSXP, ngrp));
+  if (INHERITS(x, char_integer64)) {
+    int64_t *ansd = (int64_t *)REAL(ans);
+    for (int i=0; i<ngrp; ++i) {
+      ansd[i] = (s[i]>INT64_MAX || s[i]<=INT64_MIN) ? NA_INTEGER64 : (int64_t)s[i];
+    }
+  } else {
+    double *ansd = REAL(ans);
+    for (int i=0; i<ngrp; ++i) {
+      if (s[i] > DBL_MAX) ansd[i] = R_PosInf;
+      else if (s[i] < -DBL_MAX) ansd[i] = R_NegInf;
+      else ansd[i] = (double)s[i];
+    }
   }
   free(s);
   copyMostAttrib(x, ans);
   UNPROTECT(1);
   // Rprintf(_("this gprod took %8.3f\n"), 1.0*(clock()-start)/CLOCKS_PER_SEC);
-  return(ans);
+  return ans;
 }
 
 SEXP gshift(SEXP x, SEXP nArg, SEXP fillArg, SEXP typeArg) {