Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 14 additions & 2 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,10 +65,22 @@

13. `unique(DT)` now returns `DT` early when there are no duplicates to save RAM, [#2013](https://github.com/Rdatatable/data.table/issues/2013). Thanks to Michael Chirico for the PR.

14. Subsetting optimization with keys and indices is now possible for compound queries like `DT[a==1 & b==2]`, [#2472](https://github.com/Rdatatable/data.table/issues/2472).
14. `uniqueN()` is now faster on logical vectors. Thanks to Hugh Parsonage for [PR#2648](https://github.com/Rdatatable/data.table/pull/2648).
```
N = 1e9
was now
x = c(TRUE,FALSE,NA,rep(TRUE,N))
uniqueN(x) == 3 5.4s 0.00s
x = c(TRUE,rep(FALSE,N), NA)
uniqueN(x,na.rm=TRUE) == 2 5.4s 0.00s
x = c(rep(TRUE,N),FALSE,NA)
uniqueN(x) == 3 6.7s 0.38s
```

15. Subsetting optimization with keys and indices is now possible for compound queries like `DT[a==1 & b==2]`, [#2472](https://github.com/Rdatatable/data.table/issues/2472).
Thanks to @MichaelChirico for reporting and to @MarkusBonsch for the implementation.

15. `melt.data.table` now offers friendlier functionality for providing `value.name` for `list` input to `measure.vars`, [#1547](https://github.com/Rdatatable/data.table/issues/1547). Thanks @MichaelChirico and @franknarf1 for the suggestion and use cases, @jangorecki and @mrdwab for implementation feedback, and @MichaelChirico for ultimate implementation.
16. `melt.data.table` now offers friendlier functionality for providing `value.name` for `list` input to `measure.vars`, [#1547](https://github.com/Rdatatable/data.table/issues/1547). Thanks @MichaelChirico and @franknarf1 for the suggestion and use cases, @jangorecki and @mrdwab for implementation feedback, and @MichaelChirico for ultimate implementation.

#### BUG FIXES

Expand Down
5 changes: 4 additions & 1 deletion R/duplicated.R
Original file line number Diff line number Diff line change
Expand Up @@ -142,7 +142,10 @@ uniqueN <- function(x, by = if (is.list(x)) seq_along(x) else NULL, na.rm=FALSE)
if (is.null(x)) return(0L)
if (!is.atomic(x) && !is.data.frame(x))
stop("x must be an atomic vector or data.frames/data.tables")
if (is.atomic(x)) x = as_list(x)
if (is.atomic(x)) {
if (is.logical(x)) return(.Call(CuniqueNlogical, x, na.rm=na.rm))
x = as_list(x)
}
if (is.null(by)) by = seq_along(x)
o = forderv(x, by=by, retGrp=TRUE, na.last=if (!na.rm) FALSE else NA)
starts = attr(o, 'starts')
Expand Down
16 changes: 16 additions & 0 deletions inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -6486,6 +6486,22 @@ DT <- data.table(A = rep(1:3, each=4), B = rep(1:4, each=3), C = rep(1:2, 6))
test(1475.1, uniqueN(DT), 10L)
test(1475.2, DT[, .(uN=uniqueN(.SD)), by=A], data.table(A=1:3, uN=c(3L,4L,3L)))

# specialized uniqueN for logical vectors, PR#2648
test(1475.3, uniqueN(c(NA, TRUE, FALSE)), 3L)
test(1475.4, uniqueN(c(NA, TRUE, FALSE), na.rm = TRUE), 2L)
test(1475.5, uniqueN(c(TRUE, FALSE), na.rm = TRUE), 2L)
test(1475.6, uniqueN(c(TRUE, FALSE)), 2L)
test(1475.7, uniqueN(c(TRUE, NA)), 2L)
test(1475.8, uniqueN(c(TRUE, NA), na.rm=TRUE), 1L)
test(1475.9, uniqueN(c(FALSE, NA)), 2L)
test(1475.11, uniqueN(c(FALSE, NA), na.rm=TRUE), 1L)
test(1475.12, uniqueN(c(NA,NA)), 1L)
test(1475.13, uniqueN(c(NA,NA), na.rm=TRUE), 0L)
test(1475.14, uniqueN(NA), 1L)
test(1475.15, uniqueN(NA, na.rm=TRUE), 0L)
test(1475.16, uniqueN(logical()), 0L)
test(1475.17, uniqueN(logical(), na.rm=TRUE), 0L)

# preserve class attribute in GForce mean (and sum)
DT <- data.table(x = rep(1:3, each = 3), y = as.Date(seq(Sys.Date(), (Sys.Date() + 8), by = "day")))
test(1476.1, DT[, .(y=mean(y)), x], setDT(aggregate(y ~ x, DT, mean)))
Expand Down
1 change: 1 addition & 0 deletions src/data.table.h
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
// #include <signal.h> // the debugging machinery + breakpoint aidee
// raise(SIGINT);
#include <stdint.h> // for uint64_t rather than unsigned long long
#include <stdbool.h>
#include "myomp.h"

// data.table depends on R>=3.0.0 when R_xlen_t was introduced
Expand Down
2 changes: 2 additions & 0 deletions src/init.c
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@ SEXP fsort();
SEXP inrange();
SEXP between();
SEXP hasOpenMP();
SEXP uniqueNlogical();

// .Externals
SEXP fastmean();
Expand Down Expand Up @@ -154,6 +155,7 @@ R_CallMethodDef callMethods[] = {
{"Cinrange", (DL_FUNC) &inrange, -1},
{"Cbetween", (DL_FUNC) &between, -1},
{"ChasOpenMP", (DL_FUNC) &hasOpenMP, -1},
{"CuniqueNlogical", (DL_FUNC) &uniqueNlogical, -1},
{NULL, NULL, 0}
};

Expand Down
25 changes: 25 additions & 0 deletions src/uniqlist.c
Original file line number Diff line number Diff line change
Expand Up @@ -228,3 +228,28 @@ SEXP nestedid(SEXP l, SEXP cols, SEXP order, SEXP grps, SEXP resetvals, SEXP mul
UNPROTECT(1);
return(ans);
}

SEXP uniqueNlogical(SEXP x, SEXP narmArg) {
// single pass; short-circuit and return as soon as all 3 values are found
if (!isLogical(x)) error("x is not a logical vector");
if (!isLogical(narmArg) || length(narmArg)!=1 || INTEGER(narmArg)[0]==NA_INTEGER) error("na.rm must be TRUE or FALSE");
bool narm = LOGICAL(narmArg)[0]==1;
const R_xlen_t n = xlength(x);
if (n==0)
return ScalarInteger(0); // empty vector
Rboolean first = LOGICAL(x)[0];
R_xlen_t i=0;
while (++i<n && LOGICAL(x)[i]==first);
if (i==n)
return ScalarInteger(first==NA_INTEGER && narm ? 0 : 1); // all one value
Rboolean second = LOGICAL(x)[i];
// we've found 2 different values (first and second). Which one didn't we find? Then just look for that.
// NA_LOGICAL == INT_MIN checked in init.c
const int third = (first+second == 1) ? NA_LOGICAL : ( first+second == INT_MIN ? TRUE : FALSE );
if (third==NA_LOGICAL && narm)
return ScalarInteger(2); // TRUE and FALSE found before any NA, but na.rm=TRUE so we're done
while (++i<n) if (LOGICAL(x)[i]==third)
return ScalarInteger(3-narm);
return ScalarInteger(2-(narm && third!=NA_LOGICAL));
}