Skip to content

Support negative values of n in shift#3166

Merged
mattdowle merged 4 commits intomasterfrom
shift_negative_n
Dec 14, 2018
Merged

Support negative values of n in shift#3166
mattdowle merged 4 commits intomasterfrom
shift_negative_n

Conversation

@MichaelChirico
Copy link
Copy Markdown
Member

@MichaelChirico MichaelChirico commented Nov 30, 2018

Closes #1708

My first non-trivial (but also basically trivial) foray into C code so some careful eyes are warranted. I don't think I changed much but still..

Note well the two comments to this PR

Comment thread src/shift.c
default :
error("Unsupported type '%s'", type2char(TYPEOF(elem)));
}
copyMostAttrib(elem, tmp);
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly thing these lines are vestigial and I think they may be slowing down shift unnecessarily... there's no analogue in the 'lead' branch and I think what it's accomplishing is already done in the INTSXP branch.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which "these" precisely? what is it is accomplishing that is already done in INTSXP?

Comment thread inst/tests/tests.Rraw
c(2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 0L))
test(1960.4, shift(DT$x, -1, give.names = TRUE),
structure(c(2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, NA),
.Names = c("V1_lag_-1", NA, NA, NA, NA, NA, NA, NA, NA, NA)))
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is certainly awkwardly named. The naming is done at the R level, so it would be very easy for me to change this to V1_lead_1. Any thoughts?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better than lag/lead in this case to use shift_1 or shift_-1

Copy link
Copy Markdown
Member

@jangorecki jangorecki Dec 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test is invalid, we should not set name on vector result, only for list results it make sense, will fix it as part of #3223 - unrelated to shift vs lag/lead naming

@codecov
Copy link
Copy Markdown

codecov bot commented Nov 30, 2018

Codecov Report

Merging #3166 into master will decrease coverage by 2.42%.
The diff coverage is 98.82%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3166      +/-   ##
==========================================
- Coverage    94.6%   92.17%   -2.43%     
==========================================
  Files          61       61              
  Lines       11747    11545     -202     
==========================================
- Hits        11113    10642     -471     
- Misses        634      903     +269
Impacted Files Coverage Δ
src/shift.c 97.47% <98.82%> (-0.04%) ⬇️
R/uniqlist.R 71.42% <0%> (-28.58%) ⬇️
src/fsort.c 74.09% <0%> (-21.72%) ⬇️
src/between.c 80% <0%> (-20%) ⬇️
R/frank.R 82.43% <0%> (-17.57%) ⬇️
R/duplicated.R 77.92% <0%> (-16.68%) ⬇️
R/merge.R 81.81% <0%> (-16.67%) ⬇️
R/setkey.R 83.33% <0%> (-15.37%) ⬇️
R/setops.R 84.92% <0%> (-14.04%) ⬇️
R/foverlaps.R 86.11% <0%> (-13.89%) ⬇️
... and 16 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 02642ab...d9f1286. Read the comment docs.

@MichaelChirico
Copy link
Copy Markdown
Member Author

Codecov doesn't appear to have re-triggered with the second commit; can do manually?

Copy link
Copy Markdown
Member

@jangorecki jangorecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for future, if you see an issue that someone is assigned to, prompt that person about it before starting development. I started to refactor shift for that change, handling negative n without top level branching, also having bigger scope like removing DATAPTR calls, parallel processing for multiple columns/windows. My branch is far from being done and now my queue is long so we can proceed with your PR 👍 Nice to see you moving into more C coding!

Comment thread inst/tests/tests.Rraw
c(2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 0L))
test(1960.4, shift(DT$x, -1, give.names = TRUE),
structure(c(2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, NA),
.Names = c("V1_lag_-1", NA, NA, NA, NA, NA, NA, NA, NA, NA)))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better than lag/lead in this case to use shift_1 or shift_-1

Comment thread src/shift.c
default :
error("Unsupported type '%s'", type2char(TYPEOF(elem)));
}
copyMostAttrib(elem, tmp);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which "these" precisely? what is it is accomplishing that is already done in INTSXP?

Comment thread man/shift.Rd
\item{type}{ default is \code{"lag"}. The other possible value is \code{"lead"}. }
\item{n}{ integer vector denoting the offset by which to lead or lag the input. To create multiple lead/lag vectors, provide multiple values to \code{n}; negative values of \code{n} will "flip" the value of \code{type}, i.e., \code{n=-1} and \code{type='lead'} is the same as \code{n=1} and \code{type='lag'}. }
\item{fill}{ Value to use for padding when the window goes beyond the input length. }
\item{type}{ default is \code{"lag"} (look "backwards"). The other possible value is \code{"lead"} (look "forwards"). }
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would more strongly suggest to use negative n instead of lead.

@MichaelChirico
Copy link
Copy Markdown
Member Author

@jangorecki oh my! i didn't even bother to check TBH, my bad.

I knew this would have some overlap with your work on rolling operations in #2961, I should have read around a bit more carefully.

Agree there's room for more improvement w parallelism but I saw that the switch to allowing +/- n in the current code base would be an easy switch and went for it 🔪

@jangorecki
Copy link
Copy Markdown
Member

jangorecki commented Nov 30, 2018

easy when you do top if branching :) I think it is possible take out memmove from if branch, similarly as I did for align=center/left in

data.table/src/froll.c

Lines 24 to 26 in 70208d9

int k_ = align==-1 ? k-1 : floor(k/2); // offset to shift
if (verbose) Rprintf("%s: align %d, shift answer by %d\n", __func__, align, -k_);
memmove((char *)ans, (char *)ans + (k_*sizeof(double)), (nx-k_)*sizeof(double)); // apply shift to achieve expected align

@mattdowle mattdowle added this to the 1.12.0 milestone Dec 14, 2018
@mattdowle mattdowle changed the title Closes #1708 -- add support for positive and negative values of n in shift Support negative values of n in shift Dec 14, 2018
Copy link
Copy Markdown
Member

@mattdowle mattdowle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that coverage isn't kicking in on this one: the coverage that is displayed above is very wrong/stale. I just merged in master and resolved conflicts which usually kicks off a refresh, but it either didn't or is taking a very long time. I'm not aware of any way to correct coverage manually. They seem to be having technical issues generally; e.g. we've been seeing multiple comments posted by the codecov bot too in other PRs.

So, taking a pragmatic approach ...

Code removal in shift.c very nice. It had two large switches through column types, but only differing in one doing lead and the other lag. You've folded it into one with a signed +/-n. Much better.
I checked coverage of master::shift.c which is at 97.52%. Only 4 lines are missed and they are all error(). So that tells us that tests cover every case in both the old lead and lag branch which is good.
No existing tests change and you've added new ones. So even though a lot of C code is removed, everything still works and that isn't because the removed code had no coverage. Merging this PR to master should update master coverage and if there's any problems it can always be fixed post merge.

Great!

@mattdowle mattdowle merged commit c434610 into master Dec 14, 2018
@mattdowle mattdowle deleted the shift_negative_n branch December 14, 2018 23:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants