New vignette -- Usages of .SD by MichaelChirico · Pull Request #3572 · Rdatatable/data.table

MichaelChirico · 2019-05-18T12:22:43Z

Not sure if we should track the png on GH or not

jangorecki · 2019-05-18T12:33:57Z

I haven't gone through yet but looks like a comprehensive guide on using .SD.

I would avoid such title, as some people might see it as "cryptic symbols".
Also as discussed in linked issue, we can have scope of that vignette extended for other tricks in j. Would be very useful if you could leave placeholders for that in the document, so it can be filled by others.
png looks unnecessarily big, not sure if 12KB will make difference but recently Matt was dealing with compiler flags to reduce the size of package.

mattdowle · 2019-05-18T16:49:53Z

Haven't look either yet but just on the png size, removing -g compiler flag saved 1MB recently (.so reduced from 1.5MB to 0.5MB) so the package size is now apx 4MB of 5MB limit. 12KB not an issue (1.2% of remaining). The Pitching.RData (1.3MB) file in vignettes/ is more of a concern but from what I can gather only the vignette PDF is installed and counts towards the 5MB installed size limt, so that should be ok.

mattdowle

Looks great!

…d warning about package depending on R 3.5+

codecov · 2019-05-22T03:55:26Z

Codecov Report

Merging #3572 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #3572   +/-   ##
=======================================
  Coverage   97.58%   97.58%           
=======================================
  Files          66       66           
  Lines       12695    12695           
=======================================
  Hits        12389    12389           
  Misses        306      306

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6484781...9660daa. Read the comment docs.

codecov · 2019-05-22T03:55:26Z

Codecov Report

Merging #3572 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #3572   +/-   ##
=======================================
  Coverage   97.58%   97.58%           
=======================================
  Files          66       66           
  Lines       12695    12695           
=======================================
  Hits        12389    12389           
  Misses        306      306

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6484781...9660daa. Read the comment docs.

mattdowle · 2019-05-22T03:56:39Z

For future reference ...
It seems the Travis error when building vignettes and it suggests that rmarkdown is not available and to install it, is spurious (seen that before, iirc). It just means that something is wrong with the vignette somewhere and you have to reproduce it locally to find what's wrong.
In addition to changing the RData version from 3 to 2, I needed to change cache= from TRUE to FALSE. Otherwise it produced warning about version 3 format meaning that the package then depends on R 3.5+.

jangorecki · 2019-05-22T05:16:57Z

why not use csv.gz instead of RData? there is no risk that vignette can be build on newer R only due to format incompatibility?

jangorecki · 2019-05-24T14:13:51Z

+
+This vignette will explain the most common ways to use the `.SD` variable in your `data.table` analyses. It is an adaptation of [this answer](https://stackoverflow.com/a/47406952/3576984) given on StackOverflow.
+
+# What is `.SD`?


https://rdatatable.gitlab.io/data.table/library/data.table/doc/datatable-sd-usage.html
rendered from R it results into What is <code>.SD</code>? tab name in browser, maybe better remove code and leave .SD as plaintext

jangorecki · 2019-05-24T14:14:34Z

+  Pitching[ , coef(lm(ERA ~ ., data = .SD))['W'], .SDcols = c('W', rhs)]
+})
+barplot(lm_coef, names.arg = sapply(models, paste, collapse = '/'),
+        main = 'Wins Coefficient\nWiith Various Covariates',


Wiith double i

jangorecki · 2019-05-24T14:15:15Z

+
+## Conditional Joins
+
+`data.table` syntax is beautiful for its simplicity and robustness. The syntax `x[i]` flexibly handles two common approaches to subsetting -- when `i` is a `logical` vector, `x[i]` will return those rows of `x` corresponding to where `i` is `TRUE`; when `i` is _another `data.table`_, a (right) `join` is performed (in the plain form, using the `key`s of `x` and `i`, otherwise, when `on = ` is specified, using matches of those columns).


there is also a case of DT["someid"]

jangorecki · 2019-05-24T14:18:39Z

+
+Note that this approach can of course be combined with `.SDcols` to return only portions of the `data.table` for each `.SD` (with the caveat that `.SDcols` should be fixed across the various subsets)
+
+_NB_: `.SD[1L]` is currently optimized by [_`GForce`_](https://jangorecki.gitlab.io/data.table/library/data.table/html/datatable-optimize.html) ([see also](https://stackoverflow.com/questions/22137591/about-gforce-in-data-table-1-9-2)), `data.table` internals which massively speed up the most common grouped operations like `sum` or `mean` -- see `?GForce` for more details and keep an eye on/voice support for feature improvement requests for updates on this front: [1](https://github.com/Rdatatable/data.table/issues/735), [2](https://github.com/Rdatatable/data.table/issues/2778), [3](https://github.com/Rdatatable/data.table/issues/523), [4](https://github.com/Rdatatable/data.table/issues/971), [5](https://github.com/Rdatatable/data.table/issues/1197), [6](https://github.com/Rdatatable/data.table/issues/1414)


use Rdatatable namespace instead of jangorecki: https://Rdatatable.gitlab.io/data.table/library/data.table/html/datatable-optimize.html

Michael Chirico added 3 commits May 18, 2019 18:57

initial work on converting SO answer to vignette

b0380eb

Closes #3412 -- adds .SD vignette from SO answer

024c891

tidying code & knitr options

e2c4ba9

mattdowle added this to the 1.12.4 milestone May 22, 2019

mattdowle added the documentation label May 22, 2019

mattdowle approved these changes May 22, 2019

View reviewed changes

mattdowle added 2 commits May 21, 2019 20:07

Merge branch 'master' into sd_vignette

613f90c

RData/Rdata typo, and changed RData files from version 3 to 2 to avoi…

9660daa

…d warning about package depending on R 3.5+

mattdowle merged commit c68e95e into master May 22, 2019

mattdowle deleted the sd_vignette branch May 22, 2019 03:58

jangorecki reviewed May 24, 2019

View reviewed changes

MichaelChirico mentioned this pull request May 24, 2019

Follow-up to #3572 #3593

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New vignette -- Usages of .SD #3572

New vignette -- Usages of .SD #3572
mattdowle merged 5 commits intomasterfrom
sd_vignette

MichaelChirico commented May 18, 2019 •

edited by mattdowle

Loading

Uh oh!

jangorecki commented May 18, 2019 •

edited

Loading

Uh oh!

mattdowle commented May 18, 2019 •

edited

Loading

Uh oh!

mattdowle left a comment

Uh oh!

codecov bot commented May 22, 2019 •

edited

Loading

Uh oh!

codecov bot commented May 22, 2019

Uh oh!

mattdowle commented May 22, 2019 •

edited

Loading

Uh oh!

jangorecki commented May 22, 2019

Uh oh!

jangorecki May 24, 2019 •

edited

Loading

Uh oh!

jangorecki May 24, 2019

Uh oh!

jangorecki May 24, 2019

Uh oh!

jangorecki May 24, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		This vignette will explain the most common ways to use the `.SD` variable in your `data.table` analyses. It is an adaptation of [this answer](https://stackoverflow.com/a/47406952/3576984) given on StackOverflow.

		# What is `.SD`?


		## Conditional Joins

		`data.table` syntax is beautiful for its simplicity and robustness. The syntax `x[i]` flexibly handles two common approaches to subsetting -- when `i` is a `logical` vector, `x[i]` will return those rows of `x` corresponding to where `i` is `TRUE`; when `i` is _another `data.table`_, a (right) `join` is performed (in the plain form, using the `key`s of `x` and `i`, otherwise, when `on = ` is specified, using matches of those columns).


		Note that this approach can of course be combined with `.SDcols` to return only portions of the `data.table` for each `.SD` (with the caveat that `.SDcols` should be fixed across the various subsets)

		_NB_: `.SD[1L]` is currently optimized by [_`GForce`_](https://jangorecki.gitlab.io/data.table/library/data.table/html/datatable-optimize.html) ([see also](https://stackoverflow.com/questions/22137591/about-gforce-in-data-table-1-9-2)), `data.table` internals which massively speed up the most common grouped operations like `sum` or `mean` -- see `?GForce` for more details and keep an eye on/voice support for feature improvement requests for updates on this front: [1](https://github.com/Rdatatable/data.table/issues/735), [2](https://github.com/Rdatatable/data.table/issues/2778), [3](https://github.com/Rdatatable/data.table/issues/523), [4](https://github.com/Rdatatable/data.table/issues/971), [5](https://github.com/Rdatatable/data.table/issues/1197), [6](https://github.com/Rdatatable/data.table/issues/1414)

Conversation

MichaelChirico commented May 18, 2019 • edited by mattdowle Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jangorecki commented May 18, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattdowle commented May 18, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattdowle left a comment

Choose a reason for hiding this comment

Uh oh!

codecov bot commented May 22, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

codecov bot commented May 22, 2019

Codecov Report

Uh oh!

mattdowle commented May 22, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jangorecki commented May 22, 2019

Uh oh!

jangorecki May 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jangorecki May 24, 2019

Choose a reason for hiding this comment

Uh oh!

jangorecki May 24, 2019

Choose a reason for hiding this comment

Uh oh!

jangorecki May 24, 2019

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MichaelChirico commented May 18, 2019 •

edited by mattdowle

Loading

jangorecki commented May 18, 2019 •

edited

Loading

mattdowle commented May 18, 2019 •

edited

Loading

codecov bot commented May 22, 2019 •

edited

Loading

mattdowle commented May 22, 2019 •

edited

Loading

jangorecki May 24, 2019 •

edited

Loading