From 87d8364d41fab78f4cfd6c1647651f5f2bb94e81 Mon Sep 17 00:00:00 2001 From: jangorecki Date: Wed, 22 May 2019 11:23:05 +0530 Subject: [PATCH 1/2] mention options in importing vign --- vignettes/datatable-importing.Rmd | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/vignettes/datatable-importing.Rmd b/vignettes/datatable-importing.Rmd index c1236d1239..11ab68d07d 100644 --- a/vignettes/datatable-importing.Rmd +++ b/vignettes/datatable-importing.Rmd @@ -122,6 +122,11 @@ The case for `:=` is slightly different, because `:=` is interpreted as a functi If you don't mind having `id` and `grp` registered as variables globally in your package namespace you can use `?globalVariables`. Be aware that these notes do not have any impact on the code or its functionality; if you are not going to publish your package, you may simply choose to ignore them. +## Avoid of package options + +Common practice is to provide customization of various options globally for a package using `options` function. `data.table` is no exception here. Use of options of your dependency package should be avoided inside your package, and use of options by end user should be used with extra care. The reason for that is because those options works globally, for your package, other packages, and user's code. +Consider the case when `data.table` is imported by `pkgX`, where `pkgX` compute a join. Then an end user sets `options(datatable.nomatch=NULL)`, as a result join performed by `pkgX` is now an inner join, not outer join. Options are generally safe when you work just with `data.table`, or the package you are developing will be an internal package that will work just with `data.table`. Remember that global options should be well documented. + ## Troubleshooting If you face any problems in this process, before trying to ask questions or reporting issues, please confirm that the problem is reproducible in a clean R session using the R console: `R CMD check package.name`. From 49485a8531ce95013ff653d81c61c45c9c8f32b3 Mon Sep 17 00:00:00 2001 From: mattdowle Date: Thu, 30 May 2019 18:59:46 -0700 Subject: [PATCH 2/2] revised new section in vignette --- vignettes/datatable-importing.Rmd | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/vignettes/datatable-importing.Rmd b/vignettes/datatable-importing.Rmd index 11ab68d07d..16a3cb39d2 100644 --- a/vignettes/datatable-importing.Rmd +++ b/vignettes/datatable-importing.Rmd @@ -122,14 +122,15 @@ The case for `:=` is slightly different, because `:=` is interpreted as a functi If you don't mind having `id` and `grp` registered as variables globally in your package namespace you can use `?globalVariables`. Be aware that these notes do not have any impact on the code or its functionality; if you are not going to publish your package, you may simply choose to ignore them. -## Avoid of package options +## Care needed when providing and using options -Common practice is to provide customization of various options globally for a package using `options` function. `data.table` is no exception here. Use of options of your dependency package should be avoided inside your package, and use of options by end user should be used with extra care. The reason for that is because those options works globally, for your package, other packages, and user's code. -Consider the case when `data.table` is imported by `pkgX`, where `pkgX` compute a join. Then an end user sets `options(datatable.nomatch=NULL)`, as a result join performed by `pkgX` is now an inner join, not outer join. Options are generally safe when you work just with `data.table`, or the package you are developing will be an internal package that will work just with `data.table`. Remember that global options should be well documented. +Common practice by R packages is to provide customization options set by `options(name=val)` and fetched using `getOption("name", default)`. Function arguments often specify a call to `getOption()` so that the user knows (from `?fun` or `args(fun)`) the name of the option controlling the default for that parameter; e.g. `fun(..., verbose=getOption("datatable.verbose", FALSE))`. All `data.table` options start with `datatable.` so as to not conflict with options in other packages. A user simply calls `options(datatable.verbose=TRUE)` to turn on verbosity. This affects all calls to `fun()` other the ones which have been provided `verbose=` explicity; e.g. `fun(..., verbose=FALSE)`. + +The option mechanism in R is _global_. Meaning that if a user sets a `data.table` option for their own use, that setting also affects code inside any package that is using `data.table` too. For an option like `datatable.verbose`, this is exactly the desired behavior since the desire is to trace and log all `data.table` operations from wherever they originate; turning on verbosity does not affect the results. Another unique-to-R and excellent-for-production option is R's `options(warn=2)` which turns all warnings into errors. Again, the desire is to affect any warning in any package so as to not missing any warnings in production. There are 6 `datatable.print.*` options and 3 optimization options which do not affect the result of operations, either. However, there is one `data.table` option that does and is now a concern: `datatable.nomatch`. This option changes the default join from outer to inner. [Aside, the default join is outer because outer is safer; it doesn't drop missing data silently.] Some users prefer inner join to be the default and we provided this option for them. However, a user setting this option can unintentionally change the behavior of joins inside packages that use `data.table`. Accordingly, in v1.12.4, we have started the process to deprecate the `datatable.nomatch` option. It is the only `data.table` option with this concern. ## Troubleshooting -If you face any problems in this process, before trying to ask questions or reporting issues, please confirm that the problem is reproducible in a clean R session using the R console: `R CMD check package.name`. +If you face any problems in creating a package that uses data.table, please confirm that the problem is reproducible in a clean R session using the R console: `R CMD check package.name`. Some of the most common issues developers are facing are usually related to helper tools that are meant to automate some package development tasks, for example, using `roxygen` to generate your `NAMESPACE` file from metadata in the R code files. Others are related to helpers that build and check the package. Unfortunately, these helpers sometimes have unintended/hidden side effects which can obscure the source of your troubles. As such, be sure to double check using R console (run R on the command line) and ensure the import is defined in the `DESCRIPTION` and `NAMESPACE` files following the [instructions](#DESCRIPTION) [above](#NAMESPACE).