Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -185,7 +185,7 @@

34. `na.omit(DT)` did not remove `NA` in `nanotime` columns, [#4744](https://github.com/Rdatatable/data.table/issues/4744). Thanks Jean-Mathieu Vermosen for reporting, and Michael Chirico for the PR.

35. `DT[, min(intCol, na.rm=TRUE), by=grp]` would return `Inf` for any groups containing all NAs, with a type change from `integer` to `numeric` to hold the `Inf`, and with warning. Now `NA` is returned for such all-NA groups, without warning or type change. This is almost-surely less surprising, more convenient, consistent, and efficient. There was no user request for this, likely because our desire to be consistent with base R in this regard was known (base R's `min(x, na.rm=TRUE)` returns `Inf` with warning for all-NA input). Matt Dowle made this change when reworking internals, [#5105](https://github.com/Rdatatable/data.table/pull/5105). The old behavior seemed so bad, and since there was a warning too, it seemed more appropriate to treat it as a bug.
35. `DT[, min(intCol, na.rm=TRUE), by=grp]` would return `Inf` for any groups containing all NAs, with a type change from `integer` to `numeric` to hold the `Inf`, and with warning. Similarly `max` would return `-Inf`. Now `NA` is returned for such all-NA groups, without warning or type change. This is almost-surely less surprising, more convenient, consistent, and efficient. There was no user request for this, likely because our desire to be consistent with base R in this regard was known (`base::min(x, na.rm=TRUE)` returns `Inf` with warning for all-NA input). Matt Dowle made this change when reworking internals, [#5105](https://github.com/Rdatatable/data.table/pull/5105). The old behavior seemed so bad, and since there was a warning too, it seemed appropriate to treat it as a bug.

```R
DT
Expand Down Expand Up @@ -228,6 +228,8 @@
# 2: b NA # NA because there are no non-NA, naturally
# no inconvenient warning
```

36. `DT[, min(int64Col), by=grp]` (and `max`) would return incorrect results for `bit64::integer64` columns, [#4444](https://github.com/Rdatatable/data.table/issues/4444). Thanks to @go-see for reporting, and Michael Chirico for the PR.

## NOTES

Expand Down
16 changes: 16 additions & 0 deletions inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -17954,3 +17954,19 @@ test(2206.11, isRealReallyInt(integer()), FALSE)
DT = CJ(x=1:3, y=letters[1:2])
DT[, z := complex(real=1:6, imaginary=6:1)]
test(2207, dcast(DT, x~y, value.var="z"), data.table(x=1:3, a=c(1+6i, 3+4i, 5+2i), b=c(2+5i, 4+3i, 6+1i), key='x'))

# gmin/gmax for integer64, #4444
if (test_bit64) {
DT = data.table(grp=c(1L, 1L, 1L, 2L), i64=as.integer64(c(NA, 1:3)))
old = options(datatable.optimize=2L)
test(2208.1, DT[, min(i64), by=grp], data.table(grp=1:2, V1=as.integer64(c(NA, 3))))
test(2208.2, DT[, min(i64, na.rm=TRUE), by=grp], data.table(grp=1:2, V1=as.integer64(c(1, 3))))
test(2208.3, DT[, max(i64), by=grp], data.table(grp=1:2, V1=as.integer64(c(NA, 3))))
test(2208.4, DT[, max(i64, na.rm=TRUE), by=grp], data.table(grp=1:2, V1=as.integer64(2:3)))
# create an all-NA group
DT[, i64:=rev(i64)]
test(2208.7, DT[, min(i64, na.rm=TRUE), by=grp], data.table(grp=1:2, V1=as.integer64(c(1,NA))))
test(2208.8, DT[, max(i64, na.rm=TRUE), by=grp], data.table(grp=1:2, V1=as.integer64(c(3,NA))))
options(old)
}

63 changes: 44 additions & 19 deletions src/gsumm.c
Original file line number Diff line number Diff line change
Expand Up @@ -742,7 +742,7 @@ static SEXP gminmax(SEXP x, SEXP narm, const bool min)
for (int i=0; i<ngrp; ++i) ansd[i] = NA_INTEGER; // in the all-NA case we now return NA for type consistency
for (int i=0; i<n; ++i) {
const int ix = (irowslen == -1) ? i : irows[i]-1;
if (xd[ix] == NA_INTEGER) continue;
if (xd[ix]==NA_INTEGER) continue;
const int thisgrp = grp[i];
if (ansd[thisgrp]==NA_INTEGER || (xd[ix]<ansd[thisgrp])==min)
ansd[thisgrp] = xd[ix];
Expand Down Expand Up @@ -776,26 +776,51 @@ static SEXP gminmax(SEXP x, SEXP narm, const bool min)
break;
case REALSXP: {
ans = PROTECT(allocVector(REALSXP, ngrp));
double *ansd = REAL(ans);
const double *xd = REAL(x);
if (!LOGICAL(narm)[0]) {
const double init = min ? R_PosInf : R_NegInf;
for (int i=0; i<ngrp; ++i) ansd[i] = init;
for (int i=0; i<n; ++i) {
const int thisgrp = grp[i];
if (ISNAN(ansd[thisgrp])) continue;
const int ix = (irowslen == -1) ? i : irows[i]-1;
if (ISNAN(xd[ix]) || (xd[ix]<ansd[thisgrp])==min)
ansd[thisgrp] = xd[ix];
if (INHERITS(x, char_integer64)) {
int64_t *ansd = (int64_t *)REAL(ans);
const int64_t *xd = (const int64_t *)REAL(x);
if (!LOGICAL(narm)[0]) {
const int64_t init = min ? INT64_MAX : INT64_MIN+1;
for (int i=0; i<ngrp; ++i) ansd[i] = init;
for (int i=0; i<n; ++i) {
const int thisgrp = grp[i];
if (ansd[thisgrp]==NA_INTEGER64) continue;
const int ix = (irowslen == -1) ? i : irows[i]-1;
if (xd[ix]==NA_INTEGER64 || (xd[ix]<ansd[thisgrp])==min)
ansd[thisgrp] = xd[ix];
}
} else {
for (int i=0; i<ngrp; ++i) ansd[i] = NA_INTEGER64;
for (int i=0; i<n; ++i) {
const int ix = (irowslen == -1) ? i : irows[i]-1;
if (xd[ix]==NA_INTEGER64) continue;
const int thisgrp = grp[i];
if (ansd[thisgrp]==NA_INTEGER64 || (xd[ix]<ansd[thisgrp])==min)
ansd[thisgrp] = xd[ix];
}
}
} else {
for (int i=0; i<ngrp; ++i) ansd[i] = NA_REAL;
for (int i=0; i<n; ++i) {
const int ix = (irowslen == -1) ? i : irows[i]-1;
if (ISNAN(xd[ix])) continue;
const int thisgrp = grp[i];
if (ISNAN(ansd[thisgrp]) || (xd[ix]<ansd[thisgrp])==min)
ansd[thisgrp] = xd[ix];
double *ansd = REAL(ans);
const double *xd = REAL(x);
if (!LOGICAL(narm)[0]) {
const double init = min ? R_PosInf : R_NegInf;
for (int i=0; i<ngrp; ++i) ansd[i] = init;
for (int i=0; i<n; ++i) {
const int thisgrp = grp[i];
if (ISNAN(ansd[thisgrp])) continue;
const int ix = (irowslen == -1) ? i : irows[i]-1;
if (ISNAN(xd[ix]) || (xd[ix]<ansd[thisgrp])==min)
ansd[thisgrp] = xd[ix];
}
} else {
for (int i=0; i<ngrp; ++i) ansd[i] = NA_REAL;
for (int i=0; i<n; ++i) {
const int ix = (irowslen == -1) ? i : irows[i]-1;
if (ISNAN(xd[ix])) continue;
const int thisgrp = grp[i];
if (ISNAN(ansd[thisgrp]) || (xd[ix]<ansd[thisgrp])==min)
ansd[thisgrp] = xd[ix];
}
}
}}
break;
Expand Down