Support negative values of n in shift#3166
Conversation
| default : | ||
| error("Unsupported type '%s'", type2char(TYPEOF(elem))); | ||
| } | ||
| copyMostAttrib(elem, tmp); |
There was a problem hiding this comment.
I honestly thing these lines are vestigial and I think they may be slowing down shift unnecessarily... there's no analogue in the 'lead' branch and I think what it's accomplishing is already done in the INTSXP branch.
There was a problem hiding this comment.
which "these" precisely? what is it is accomplishing that is already done in INTSXP?
| c(2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 0L)) | ||
| test(1960.4, shift(DT$x, -1, give.names = TRUE), | ||
| structure(c(2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, NA), | ||
| .Names = c("V1_lag_-1", NA, NA, NA, NA, NA, NA, NA, NA, NA))) |
There was a problem hiding this comment.
This is certainly awkwardly named. The naming is done at the R level, so it would be very easy for me to change this to V1_lead_1. Any thoughts?
There was a problem hiding this comment.
better than lag/lead in this case to use shift_1 or shift_-1
There was a problem hiding this comment.
this test is invalid, we should not set name on vector result, only for list results it make sense, will fix it as part of #3223 - unrelated to shift vs lag/lead naming
Codecov Report
@@ Coverage Diff @@
## master #3166 +/- ##
==========================================
- Coverage 94.6% 92.17% -2.43%
==========================================
Files 61 61
Lines 11747 11545 -202
==========================================
- Hits 11113 10642 -471
- Misses 634 903 +269
Continue to review full report at Codecov.
|
7ddbd62 to
e0c05fa
Compare
|
Codecov doesn't appear to have re-triggered with the second commit; can do manually? |
There was a problem hiding this comment.
Just for future, if you see an issue that someone is assigned to, prompt that person about it before starting development. I started to refactor shift for that change, handling negative n without top level branching, also having bigger scope like removing DATAPTR calls, parallel processing for multiple columns/windows. My branch is far from being done and now my queue is long so we can proceed with your PR 👍 Nice to see you moving into more C coding!
| c(2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 0L)) | ||
| test(1960.4, shift(DT$x, -1, give.names = TRUE), | ||
| structure(c(2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, NA), | ||
| .Names = c("V1_lag_-1", NA, NA, NA, NA, NA, NA, NA, NA, NA))) |
There was a problem hiding this comment.
better than lag/lead in this case to use shift_1 or shift_-1
| default : | ||
| error("Unsupported type '%s'", type2char(TYPEOF(elem))); | ||
| } | ||
| copyMostAttrib(elem, tmp); |
There was a problem hiding this comment.
which "these" precisely? what is it is accomplishing that is already done in INTSXP?
| \item{type}{ default is \code{"lag"}. The other possible value is \code{"lead"}. } | ||
| \item{n}{ integer vector denoting the offset by which to lead or lag the input. To create multiple lead/lag vectors, provide multiple values to \code{n}; negative values of \code{n} will "flip" the value of \code{type}, i.e., \code{n=-1} and \code{type='lead'} is the same as \code{n=1} and \code{type='lag'}. } | ||
| \item{fill}{ Value to use for padding when the window goes beyond the input length. } | ||
| \item{type}{ default is \code{"lag"} (look "backwards"). The other possible value is \code{"lead"} (look "forwards"). } |
There was a problem hiding this comment.
I would more strongly suggest to use negative n instead of lead.
|
@jangorecki oh my! i didn't even bother to check TBH, my bad. I knew this would have some overlap with your work on rolling operations in #2961, I should have read around a bit more carefully. Agree there's room for more improvement w parallelism but I saw that the switch to allowing +/- |
|
easy when you do top Lines 24 to 26 in 70208d9 |
There was a problem hiding this comment.
Agree that coverage isn't kicking in on this one: the coverage that is displayed above is very wrong/stale. I just merged in master and resolved conflicts which usually kicks off a refresh, but it either didn't or is taking a very long time. I'm not aware of any way to correct coverage manually. They seem to be having technical issues generally; e.g. we've been seeing multiple comments posted by the codecov bot too in other PRs.
So, taking a pragmatic approach ...
Code removal in shift.c very nice. It had two large switches through column types, but only differing in one doing lead and the other lag. You've folded it into one with a signed +/-n. Much better.
I checked coverage of master::shift.c which is at 97.52%. Only 4 lines are missed and they are all error(). So that tells us that tests cover every case in both the old lead and lag branch which is good.
No existing tests change and you've added new ones. So even though a lot of C code is removed, everything still works and that isn't because the removed code had no coverage. Merging this PR to master should update master coverage and if there's any problems it can always be fixed post merge.
Great!
Closes #1708
My first non-trivial (but also basically trivial) foray into C code so some careful eyes are warranted. I don't think I changed much but still..
Note well the two comments to this PR