diff --git a/NEWS.md b/NEWS.md index 17d6622203..a2576e71d2 100644 --- a/NEWS.md +++ b/NEWS.md @@ -167,6 +167,8 @@ 8. OpenBSD 6.9 released May 2021 apparently uses a 16 year old version of zlib (v1.2.3 from 2005) which induces `Compress gzip error: -9` from `fwrite()`, [#5048](https://github.com/Rdatatable/data.table/issues/5048). Thanks to Philippe Chataignon for investigating and for the PR which attempts a solution. +9. `?"."`, `?".."`, `?".("`, and `?".()"` now point to `?data.table`, [#4385](https://github.com/Rdatatable/data.table/issues/4385) [#4407](https://github.com/Rdatatable/data.table/issues/4407). To help users find the documentation for these convenience features available inside `DT[...]`. Recall that `.` is an alias for `list`, and `..var` tells `data.table` to look for `var` in the calling environment as opposed to a column of the table. + # data.table [v1.14.0](https://github.com/Rdatatable/data.table/milestone/23?closed=1) (21 Feb 2021) diff --git a/R/data.table.R b/R/data.table.R index b3e6cc826d..06fadd9ee6 100644 --- a/R/data.table.R +++ b/R/data.table.R @@ -1351,7 +1351,7 @@ replace_dot_alias = function(e) { # There isn't a copy of the columns here, the xvar symbols point to the SD columns (copy-on-write). if (is.name(jsub) && is.null(lhs) && !exists(jsubChar<-as.character(jsub), SDenv, inherits=FALSE)) { - stop("j (the 2nd argument inside [...]) is a single symbol but column name '",jsubChar,"' is not found. Perhaps you intended DT[, ..",jsubChar,"]. This difference to data.frame is deliberate and explained in FAQ 1.1.") + stop("j (the 2nd argument inside [...]) is a single symbol but column name '",jsubChar,"' is not found. If you intended to select columns using a variable in calling scope, please try DT[, ..",jsubChar,"]. The .. prefix conveys one-level-up similar to a file system path.") } jval = eval(jsub, SDenv, parent.frame()) diff --git a/man/data.table.Rd b/man/data.table.Rd index e934028a3b..9df490f77d 100644 --- a/man/data.table.Rd +++ b/man/data.table.Rd @@ -5,6 +5,10 @@ \alias{Ops.data.table} \alias{is.na.data.table} \alias{[.data.table} +\alias{.} +\alias{.(} +\alias{.()} +\alias{..} \title{ Enhanced data.frame } \description{ \code{data.table} \emph{inherits} from \code{data.frame}. It offers fast and memory efficient: file reader and writer, aggregations, updates, equi, non-equi, rolling, range and interval joins, in a short and flexible syntax, for faster development. @@ -276,9 +280,9 @@ DT[2:5, cat(v, "\n")] # just for j's side effect # select columns the data.frame way DT[, 2] # 2nd column, returns a data.table always -colNum = 2 # to refer vars in `j` from the outside of data use `..` prefix -DT[, ..colNum] # same, equivalent to DT[, .SD, .SDcols=colNum] -DT[["v"]] # same as DT[, v] but much faster +colNum = 2 +DT[, ..colNum] # same, .. prefix conveys to look for colNum one-level-up in calling scope +DT[["v"]] # same as DT[, v] but faster if called in a loop # grouping operations - j and by DT[, sum(v), by=x] # ad hoc by, order of groups preserved in result diff --git a/vignettes/datatable-faq.Rmd b/vignettes/datatable-faq.Rmd index 1df42e166c..f66f9611f1 100644 --- a/vignettes/datatable-faq.Rmd +++ b/vignettes/datatable-faq.Rmd @@ -66,22 +66,21 @@ Also continue reading and see the FAQ after next. Skim whole documents before ge The `j` expression is the 2nd argument. Try `DT[ , c("x","y","z")]` or `DT[ , .(x,y,z)]`. -## I assigned a variable `mycol = "x"` but then `DT[ , mycol]` returns `"x"`. How do I get it to look up the column name contained in the `mycol` variable? +## I assigned a variable `mycol="x"` but then `DT[, mycol]` returns an error. How do I get it to look up the column name contained in the `mycol` variable? -What's happening is that the `j` expression sees objects in the calling scope. The variable `mycol` does not exist as a column name of `DT` so `data.table` then looked in the calling scope and found `mycol` there and returned its value `"x"`. This is correct behaviour currently. Had `mycol` been a column name, then that column's data would have been returned. +The error is that column named `"mycol"` cannot be found, and this error is correct. `data.table`'s scoping is different to `data.frame` in that you can use column names as if they are variables directly inside `DT[...]` without prefixing each column name with `DT$`; see FAQ 1.1 above. -To get the column `x` from `DT`, there are a few options: +To use `mycol` to select the column `x` from `DT`, there are a few options: ```r -# using .. to tell data.table the variable should be evaluated -DT[ , ..mycol] -# using with=FALSE to do the same -DT[ , mycol, with=FALSE] -# treating DT as a list and using [[ -DT[[mycol]] +DT[, ..mycol] # .. prefix conveys to look for the mycol one level up in calling scope +DT[, mycol, with=FALSE] # revert to data.frame behavior +DT[[mycol]] # treat DT as a list and use [[ from base R ``` -The `with` argument refers to the `base` function `with` -- when `with=TRUE`, `data.table` operates similar to `with`, i.e. `DT[ , mycol]` behaves like `with(DT, mycol)`. When `with=FALSE`, the standard `data.frame` evaluation rules apply. +See `?data.table` for more details about the `..` prefix. + +The `with` argument takes its name from the `base` function `with()`. When `with=TRUE` (default), `data.table` operates similar to `with()`, i.e. `DT[, mycol]` behaves like `with(DT, mycol)`. When `with=FALSE`, the standard `data.frame` evaluation rules apply to all variables in `j` and you can no longer use column names directly. ## What are the benefits of being able to use column names as if they are variables inside `DT[...]`?