Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ export(setnafill)
export(.Last.updated)
export(fcoalesce)
export(substitute2)
export(DT) # mtcars |> DT(i,j,by) #4872

S3method("[", data.table)
S3method("[<-", data.table)
Expand Down
6 changes: 6 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,12 @@

21. `melt()` was pseudo generic in that `melt(DT)` would dispatch to the `melt.data.table` method but `melt(not-DT)` would explicitly redirect to `reshape2`. Now `melt()` is standard generic so that methods can be developed in other packages, [#4864](https://github.com/Rdatatable/data.table/pull/4864). Thanks to @odelmarcelle for suggesting and implementing.

22. `DT(i, j, by, ...)` has been added, i.e. functional form of a `data.table` query, [#641](https://github.com/Rdatatable/data.table/issues/641) [#4872](https://github.com/Rdatatable/data.table/issues/4872). Thanks to Yike Lu and Elio Campitelli for filing requests, many others for comments and suggestions, and Matt Dowle for the PR. This enables the `data.table` general form query to be invoked on a `data.frame` without converting it to a `data.table` first. The class of the input object is retained.

```R
mtcars |> DT(mpg>20, .(mean_hp=mean(hp)), by=cyl)
```

## BUG FIXES

1. `by=.EACHI` when `i` is keyed but `on=` different columns than `i`'s key could create an invalidly keyed result, [#4603](https://github.com/Rdatatable/data.table/issues/4603) [#4911](https://github.com/Rdatatable/data.table/issues/4911). Thanks to @myoung3 and @adamaltmejd for reporting, and @ColeMiller1 for the PR. An invalid key is where a `data.table` is marked as sorted by the key columns but the data is not sorted by those columns, leading to incorrect results from subsequent queries.
Expand Down
6 changes: 4 additions & 2 deletions R/data.table.R
Original file line number Diff line number Diff line change
Expand Up @@ -846,10 +846,10 @@ replace_dot_alias = function(e) {
if (!is.na(nomatch)) irows = irows[irows!=0L] # TO DO: can be removed now we have CisSortedSubset
if (length(allbyvars)) { ############### TO DO TO DO TO DO ###############
if (verbose) catf("i clause present and columns used in by detected, only these subset: %s\n", brackify(allbyvars))
xss = x[irows,allbyvars,with=FALSE,nomatch=nomatch,mult=mult,roll=roll,rollends=rollends]
xss = `[.data.table`(x,irows,allbyvars,with=FALSE,nomatch=nomatch,mult=mult,roll=roll,rollends=rollends)
} else {
if (verbose) catf("i clause present but columns used in by not detected. Having to subset all columns before evaluating 'by': '%s'\n", deparse(by))
xss = x[irows,nomatch=nomatch,mult=mult,roll=roll,rollends=rollends]
xss = `[.data.table`(x,irows,nomatch=nomatch,mult=mult,roll=roll,rollends=rollends)
}
if (bysub %iscall% ':' && length(bysub)==3L) {
byval = eval(bysub, setattr(as.list(seq_along(xss)), 'names', names(xss)), parent.frame())
Expand Down Expand Up @@ -1910,6 +1910,8 @@ replace_dot_alias = function(e) {
setalloccol(ans) # TODO: overallocate in dogroups in the first place and remove this line
}

DT = `[.data.table` #4872

.optmean = function(expr) { # called by optimization of j inside [.data.table only. Outside for a small speed advantage.
if (length(expr)==2L) # no parameters passed to mean, so defaults of trim=0 and na.rm=FALSE
return(call(".External",quote(Cfastmean),expr[[2L]], FALSE))
Expand Down
8 changes: 8 additions & 0 deletions man/data.table.Rd
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
\alias{Ops.data.table}
\alias{is.na.data.table}
\alias{[.data.table}
\alias{DT}
\alias{.}
\alias{.(}
\alias{.()}
Expand Down Expand Up @@ -217,6 +218,8 @@ The way to read this out loud is: "Take \code{DT}, subset rows by \code{i}, \emp
# see ?assign to add/update/delete columns by reference using the same consistent interface
}

A \code{data.table} query may be invoked on a \code{data.frame} using functional form \code{DT(...)}, see examples. The class of the input is retained.

A \code{data.table} is a \code{list} of vectors, just like a \code{data.frame}. However :
\enumerate{
\item it never has or uses rownames. Rownames based indexing can be done by setting a \emph{key} of one or more columns or done \emph{ad-hoc} using the \code{on} argument (now preferred).
Expand Down Expand Up @@ -431,6 +434,11 @@ dev.off()
# using rleid, get max(y) and min of all cols in .SDcols for each consecutive run of 'v'
DT[, c(.(y=max(y)), lapply(.SD, min)), by=rleid(v), .SDcols=v:b]

# functional query DT(...)
if (getRversion() >= "4.1.0") { # native pipe |> new in R 4.1.0
mtcars |> DT(mpg>20, .(mean_hp=mean(hp)), by=cyl)
}

# Support guide and links:
# https://github.com/Rdatatable/data.table/wiki/Support

Expand Down