From b0d12e68062b5d59030329f4c00abec5dcacabf9 Mon Sep 17 00:00:00 2001 From: lsilvest Date: Sat, 30 Nov 2019 16:19:03 -0500 Subject: [PATCH 1/5] make CsubsetDT callable from C from in other packages --- NEWS.md | 2 ++ src/init.c | 1 + 2 files changed, 3 insertions(+) diff --git a/NEWS.md b/NEWS.md index 4a0033fcfa..371c10635f 100644 --- a/NEWS.md +++ b/NEWS.md @@ -8,6 +8,8 @@ 1. `DT[, {...; .(A,B)}]` (i.e. when `.()` is the final item of a multi-statement `{...}`) now auto-names the columns `A` and `B` (just like `DT[, .(A,B)]`) rather than `V1` and `V2`, [#2478](https://github.com/Rdatatable/data.table/issues/2478) [#609](https://github.com/Rdatatable/data.table/issues/609). Similarly, `DT[, if (.N>1) .(B), by=A]` now auto-names the column `B` rather than `V1`. Explicit names are unaffected; e.g. `DT[, {... y= ...; .(A=C+y)}, by=...]` named the column `A` before, and still does. Thanks also to @renkun-ken for his go-first strong testing which caught an issue not caught by the test suite or by revdep testing, related to NULL being the last item, [#4061](https://github.com/Rdatatable/data.table/issues/4061). +2. The C function `CsubsetDT` is exported in order to allow another package to use this fast subsetting of a `data.table` at C level. The exporting uses R's standard `R_RegisterCCallable` mechanism, which means other packages wishing to call this function use `R_GetCCallable` at C level in order to make the call. This mechanism is described in [Writing R Extensions](https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Linking-to-native-routines-in-other-packages). + ## BUG FIXES ## NOTES diff --git a/src/init.c b/src/init.c index 518d1cc3ac..56531853ab 100644 --- a/src/init.c +++ b/src/init.c @@ -202,6 +202,7 @@ static void setSizes() { void attribute_visible R_init_datatable(DllInfo *info) // relies on pkg/src/Makevars to mv data.table.so to datatable.so { + R_RegisterCCallable("data.table", "CsubsetDT", (DL_FUNC) &subsetDT); R_registerRoutines(info, NULL, callMethods, NULL, externalMethods); R_useDynamicSymbols(info, FALSE); setSizes(); From c77b81901536920beb2d39d0da16f9b36e1c219d Mon Sep 17 00:00:00 2001 From: jangorecki Date: Sun, 1 Dec 2019 08:58:11 +0530 Subject: [PATCH 2/5] more future proof link --- NEWS.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/NEWS.md b/NEWS.md index 371c10635f..15e4a80a65 100644 --- a/NEWS.md +++ b/NEWS.md @@ -8,7 +8,7 @@ 1. `DT[, {...; .(A,B)}]` (i.e. when `.()` is the final item of a multi-statement `{...}`) now auto-names the columns `A` and `B` (just like `DT[, .(A,B)]`) rather than `V1` and `V2`, [#2478](https://github.com/Rdatatable/data.table/issues/2478) [#609](https://github.com/Rdatatable/data.table/issues/609). Similarly, `DT[, if (.N>1) .(B), by=A]` now auto-names the column `B` rather than `V1`. Explicit names are unaffected; e.g. `DT[, {... y= ...; .(A=C+y)}, by=...]` named the column `A` before, and still does. Thanks also to @renkun-ken for his go-first strong testing which caught an issue not caught by the test suite or by revdep testing, related to NULL being the last item, [#4061](https://github.com/Rdatatable/data.table/issues/4061). -2. The C function `CsubsetDT` is exported in order to allow another package to use this fast subsetting of a `data.table` at C level. The exporting uses R's standard `R_RegisterCCallable` mechanism, which means other packages wishing to call this function use `R_GetCCallable` at C level in order to make the call. This mechanism is described in [Writing R Extensions](https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Linking-to-native-routines-in-other-packages). +2. The C function `CsubsetDT` is exported in order to allow another package to use this fast subsetting of a `data.table` at C level. The exporting uses R's standard `R_RegisterCCallable` mechanism, which means other packages wishing to call this function use `R_GetCCallable` at C level in order to make the call. This mechanism is described in [Writing R Extensions](https://cran.r-project.org/doc/manuals/r-release/R-exts.html) in _Linking to native routines in other packages_ section. ## BUG FIXES From 360408fc3e68e0c6ece4da672ec6e59833a9eb8e Mon Sep 17 00:00:00 2001 From: jangorecki Date: Sun, 1 Dec 2019 09:30:51 +0530 Subject: [PATCH 3/5] manual on exported C routines --- man/cdt.Rd | 20 ++++++++++++++++++++ src/init.c | 2 ++ src/subset.c | 2 +- 3 files changed, 23 insertions(+), 1 deletion(-) create mode 100644 man/cdt.Rd diff --git a/man/cdt.Rd b/man/cdt.Rd new file mode 100644 index 0000000000..13fa58b64d --- /dev/null +++ b/man/cdt.Rd @@ -0,0 +1,20 @@ +\name{cdt} +\alias{cdatatable} +\title{ data.table exported C routines } +\description{ + Some of internally used C routines are now exported. This interface should be considered experimental. List of exported C routines and their signatures are provided below in the usage section. +} +\usage{ +# SEXP subsetDT(SEXP x, SEXP rows, SEXP cols); +# p_dtCsubsetDT = R_GetCCallable("data.table", "CsubsetDT"); +} +\details{ + For details how to use those see \emph{Writing R Extensions} manual \emph{Linking to native routines in other packages} section. +} +\note{ + Be aware C routines are likely to have less input validation than their corresponding R interface. For example one should not expect \code{DT[-5L]} will be equal to \code{.Call(CsubsetDT, DT, -5L, seq_along(DT))} because translation of \code{i=-5L} to \code{seq_len(nrow(DT))[-5L]} might be happening on R level. Moreover checks that \code{i} argument is in range of \code{1:nrow(DT)}, missingness, etc. might be happening on R level too. +} +\references{ + \url{https://cran.r-project.org/doc/manuals/r-release/R-exts.html} +} +\keyword{ data } diff --git a/src/init.c b/src/init.c index 56531853ab..607d1abfda 100644 --- a/src/init.c +++ b/src/init.c @@ -202,7 +202,9 @@ static void setSizes() { void attribute_visible R_init_datatable(DllInfo *info) // relies on pkg/src/Makevars to mv data.table.so to datatable.so { + // C exported routines, see ?cdt for details R_RegisterCCallable("data.table", "CsubsetDT", (DL_FUNC) &subsetDT); + R_registerRoutines(info, NULL, callMethods, NULL, externalMethods); R_useDynamicSymbols(info, FALSE); setSizes(); diff --git a/src/subset.c b/src/subset.c index f9426bb402..eb7bb6e797 100644 --- a/src/subset.c +++ b/src/subset.c @@ -242,7 +242,7 @@ static void checkCol(SEXP col, int colNum, int nrow, SEXP x) * 4) Could do it other ways but may as well go to C now as we were going to do that anyway */ -SEXP subsetDT(SEXP x, SEXP rows, SEXP cols) { +SEXP subsetDT(SEXP x, SEXP rows, SEXP cols) { // API change needs update NEWS.md and man/cdt.Rd int nprotect=0; if (!isNewList(x)) error("Internal error. Argument 'x' to CsubsetDT is type '%s' not 'list'", type2char(TYPEOF(rows))); // # nocov if (!length(x)) return(x); // return empty list From 6c7e3437cf3bc9720b250db837b5198a4cc77f80 Mon Sep 17 00:00:00 2001 From: jangorecki Date: Sun, 1 Dec 2019 09:31:11 +0530 Subject: [PATCH 4/5] polish NEWS and link future manual page --- NEWS.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/NEWS.md b/NEWS.md index 15e4a80a65..de3e7cf81b 100644 --- a/NEWS.md +++ b/NEWS.md @@ -8,7 +8,7 @@ 1. `DT[, {...; .(A,B)}]` (i.e. when `.()` is the final item of a multi-statement `{...}`) now auto-names the columns `A` and `B` (just like `DT[, .(A,B)]`) rather than `V1` and `V2`, [#2478](https://github.com/Rdatatable/data.table/issues/2478) [#609](https://github.com/Rdatatable/data.table/issues/609). Similarly, `DT[, if (.N>1) .(B), by=A]` now auto-names the column `B` rather than `V1`. Explicit names are unaffected; e.g. `DT[, {... y= ...; .(A=C+y)}, by=...]` named the column `A` before, and still does. Thanks also to @renkun-ken for his go-first strong testing which caught an issue not caught by the test suite or by revdep testing, related to NULL being the last item, [#4061](https://github.com/Rdatatable/data.table/issues/4061). -2. The C function `CsubsetDT` is exported in order to allow another package to use this fast subsetting of a `data.table` at C level. The exporting uses R's standard `R_RegisterCCallable` mechanism, which means other packages wishing to call this function use `R_GetCCallable` at C level in order to make the call. This mechanism is described in [Writing R Extensions](https://cran.r-project.org/doc/manuals/r-release/R-exts.html) in _Linking to native routines in other packages_ section. +2. The C function `CsubsetDT` is exported in order to allow another package to use this fast subsetting of a `data.table` at C level. The exporting uses R's standard `R_RegisterCCallable` mechanism, which means other packages wishing to call this function use `R_GetCCallable` at C level in order to make the call. This mechanism is described in [Writing R Extensions](https://cran.r-project.org/doc/manuals/r-release/R-exts.html) in _Linking to native routines in other packages_ section. See [`?cdt`](https://rdatatable.gitlab.io/data.table/reference/cdt.html) for details. Thanks to @lsilvest for request and submitting PR [#3751](https://github.com/Rdatatable/data.table/issues/3751). ## BUG FIXES From 19dd7daed0b0aec66c1b3f3b997fdafa19eecaa3 Mon Sep 17 00:00:00 2001 From: jangorecki Date: Sun, 1 Dec 2019 09:55:20 +0530 Subject: [PATCH 5/5] document in importing vignette also --- vignettes/datatable-importing.Rmd | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/vignettes/datatable-importing.Rmd b/vignettes/datatable-importing.Rmd index 69cb7c3bd3..b263b5963a 100644 --- a/vignettes/datatable-importing.Rmd +++ b/vignettes/datatable-importing.Rmd @@ -189,7 +189,11 @@ If this is anyway your preferred approach to package development, please define ## Further information on dependencies -For more canonical documentation of defining packages dependency check the official manual: [Writing R Extensions](https://cran.r-project.org/doc/manuals/r-release/R-exts.html) +For more canonical documentation of defining packages dependency check the official manual: [Writing R Extensions](https://cran.r-project.org/doc/manuals/r-release/R-exts.html). + +## Importing data.table C routines + +Some of internally used C routines are now exported on C level thus can be used in R packages directly from their C code. See [`?cdt`](https://rdatatable.gitlab.io/data.table/reference/cdt.html) for details and [Writing R Extensions](https://cran.r-project.org/doc/manuals/r-release/R-exts.html) _Linking to native routines in other packages_ section for usage. ## Importing from non-R Applications {non-r-API}