Skip to content

keyby=TRUE/FALSE together with by=#4338

Merged
mattdowle merged 13 commits intomasterfrom
byebye-keyby
May 6, 2021
Merged

keyby=TRUE/FALSE together with by=#4338
mattdowle merged 13 commits intomasterfrom
byebye-keyby

Conversation

@jangorecki
Copy link
Copy Markdown
Member

closes #4307

@jangorecki jangorecki mentioned this pull request Apr 2, 2020
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 2, 2020

Codecov Report

Merging #4338 (808c697) into master (f638d4d) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #4338   +/-   ##
=======================================
  Coverage   99.42%   99.42%           
=======================================
  Files          73       73           
  Lines       14455    14459    +4     
=======================================
+ Hits        14372    14376    +4     
  Misses         83       83           
Impacted Files Coverage Δ
R/data.table.R 99.94% <100.00%> (+<0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f638d4d...808c697. Read the comment docs.

@MichaelChirico
Copy link
Copy Markdown
Member

Not 100% convinced of the need for it, but it's clearly come up for you a few times over the years, so no reason to keep an error in a demonstrated use case.

Changes LGTM

@jangorecki
Copy link
Copy Markdown
Member Author

jangorecki commented Apr 3, 2020

If a user want to have a flexibility in parameterizing sorted or unsorted results, then duplicating whole data.table query and branching on top of it was needed. With this PR it is a matter of passing a flag into single query.

if (!sorted) {
  DT[, .SD, by = grp]
} else {
  DT[, SD, keyby = grp]
}

vs

DT[, .SD, by = grp, keyby = sorted]

@jangorecki jangorecki added this to the 1.13.5 milestone Dec 13, 2020
@jangorecki
Copy link
Copy Markdown
Member Author

One example is that I want to compare benchmark script of current queries (by) vs. same queries using keyby. I have to edit every single query for that. Flag proposed in this PR makes it possible to just use logical variable as parameter.

@jangorecki jangorecki modified the milestones: 1.13.5, 1.13.7 Dec 13, 2020
@mattdowle mattdowle changed the title keyby arg now a sorting flag, #4307 keyby arg now a sorting flag Apr 15, 2021
@mattdowle mattdowle changed the title keyby arg now a sorting flag keyby=TRUE/FALSE together with by= Apr 15, 2021
Comment thread inst/tests/tests.Rraw Outdated
@jangorecki jangorecki requested a review from mattdowle April 26, 2021 12:15
Comment thread man/data.table.Rd Outdated
\emph{Advanced:} In the \code{X[Y, j]} form of grouping, the \code{j} expression sees variables in \code{X} first, then \code{Y}. We call this \emph{join inherited scope}. If the variable is not in \code{X} or \code{Y} then the calling frame is searched, its calling frame, and so on in the usual way up to and including the global environment.}

\item{keyby}{ Same as \code{by}, but with an additional \code{setkey()} run on the \code{by} columns of the result, for convenience. It is common practice to use `keyby=` routinely when you wish the result to be sorted.}
\item{keyby}{ Logical scalar indicating if groups defined in \code{by} argument should be sorted. If \code{by} is missing, then \code{keyby} can also take the same input as \code{by} would, but results will be sorted.}
Copy link
Copy Markdown
Member

@mattdowle mattdowle Apr 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to allowing keyby=TRUE/FALSE which is great, this PR seems to tilt the emphasis too: the default for keyby= in the [.data.table signature is now FALSE and the argument description here starts with 'Logical scalar ...'. Combined, these changes convey, to me at least, that we prefer keyby=TRUE/FALSE now.

  1. Can the default for keyby be restored to be nothing, as it was before. That way, the keyby=FALSE doesn't convey that we prefer TRUE/FALSE. I still much prefer seeing keyby=<expression>, rather than both by= and keyby=.
  2. Can this manual page item be changed back to how it was, with an extra sentence at the end saying that keyby= can also be TRUE/FALSE too when by= is provided.

mattdowle added 3 commits May 5, 2021 21:00
…, and 2 substitute calls instead of 3. I was finding myself comparing the split-out cases before to see if everything had been remembered in each one.
@mattdowle mattdowle merged commit 89a14ba into master May 6, 2021
@mattdowle mattdowle deleted the byebye-keyby branch May 6, 2021 06:03
@jangorecki jangorecki modified the milestones: 1.14.9, 1.15.0 Oct 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

keyby=TRUE/FALSE together with by=

3 participants