I am filling this issue as a placeholder to evaluate users demand for such feature, at present there are no plans for incorporating it, so if you would need it be sure to upvote.
Extension of #2778.
library(data.table)
id = c(0L,1L,2L,5L,6L,8L)
x = data.table(date=as.IDate(id), value=c(1,2,3,4,5,6))
x
# date value
#1: 1970-01-01 1
#2: 1970-01-02 2
#3: 1970-01-03 3
#4: 1970-01-06 4
#5: 1970-01-07 5
#6: 1970-01-09 6
## shift value by 1 date
cbind(x, data.table(s_date=as.IDate(id-1), s_value=c(NA,1,2,NA,4,NA)))
# date value s_date s_value
#1: 1970-01-01 1 1969-12-31 NA
#2: 1970-01-02 2 1970-01-01 1
#3: 1970-01-03 3 1970-01-02 2
#4: 1970-01-06 4 1970-01-05 NA
#5: 1970-01-07 5 1970-01-06 4
#6: 1970-01-09 6 1970-01-08 NA
## shift value by 1 date locf
cbind(x, data.table(s_date=as.IDate(id-1), s_value=c(NA,1,2,3,4,5)))
# date value s_date s_value
#1: 1970-01-01 1 1969-12-31 NA
#2: 1970-01-02 2 1970-01-01 1
#3: 1970-01-03 3 1970-01-02 2
#4: 1970-01-06 4 1970-01-05 3
#5: 1970-01-07 5 1970-01-06 4
#6: 1970-01-09 6 1970-01-08 5
## shift value by 1 date nocb
cbind(x, data.table(s_date=as.IDate(id-1), s_value=c(1,1,2,4,4,6)))
# date value s_date s_value
#1: 1970-01-01 1 1969-12-31 1
#2: 1970-01-02 2 1970-01-01 1
#3: 1970-01-03 3 1970-01-02 2
#4: 1970-01-06 4 1970-01-05 4
#5: 1970-01-07 5 1970-01-06 4
#6: 1970-01-09 6 1970-01-08 6
## shift value by -1 date
cbind(x, data.table(s_date=as.IDate(id+1), s_value=c(2,3,NA,5,NA,NA)))
# date value s_date s_value
#1: 1970-01-01 1 1970-01-02 2
#2: 1970-01-02 2 1970-01-03 3
#3: 1970-01-03 3 1970-01-04 NA
#4: 1970-01-06 4 1970-01-07 5
#5: 1970-01-07 5 1970-01-08 NA
#6: 1970-01-09 6 1970-01-10 NA
## shift value by -1 date locf
cbind(x, data.table(s_date=as.IDate(id+1), s_value=c(2,3,3,5,5,6)))
# date value s_date s_value
#1: 1970-01-01 1 1970-01-02 2
#2: 1970-01-02 2 1970-01-03 3
#3: 1970-01-03 3 1970-01-04 3
#4: 1970-01-06 4 1970-01-07 5
#5: 1970-01-07 5 1970-01-08 5
#6: 1970-01-09 6 1970-01-10 6
## shift value by -1 date nocb
cbind(x, data.table(s_date=as.IDate(id+1), s_value=c(2,3,4,5,6,NA)))
# date value s_date s_value
#1: 1970-01-01 1 1970-01-02 2
#2: 1970-01-02 2 1970-01-03 3
#3: 1970-01-03 3 1970-01-04 4
#4: 1970-01-06 4 1970-01-07 5
#5: 1970-01-07 5 1970-01-08 6
#6: 1970-01-09 6 1970-01-10 NA
## rollsum value by 3 date
cbind(x, data.table(w_date=sapply(as.IDate(id), function(x) paste(x+((-2):0), collapse=",")), w_value=c(sum(NA,NA,1),sum(NA,1,2),sum(1,2,3),sum(NA,NA,4),sum(NA,4,5),sum(5,NA,6))))
# date value w_date w_value
#1: 1970-01-01 1 1969-12-30,1969-12-31,1970-01-01 NA
#2: 1970-01-02 2 1969-12-31,1970-01-01,1970-01-02 NA
#3: 1970-01-03 3 1970-01-01,1970-01-02,1970-01-03 6
#4: 1970-01-06 4 1970-01-04,1970-01-05,1970-01-06 NA
#5: 1970-01-07 5 1970-01-05,1970-01-06,1970-01-07 NA
#6: 1970-01-09 6 1970-01-07,1970-01-08,1970-01-09 NA
## rollsum value by 3 date locf
cbind(x, data.table(w_date=sapply(as.IDate(id), function(x) paste(x+((-2):0), collapse=",")), w_value=c(sum(NA,NA,1),sum(NA,1,2),sum(1,2,3),sum(3,3,4),sum(3,4,5),sum(5,5,6))))
# date value w_date w_value
#1: 1970-01-01 1 1969-12-30,1969-12-31,1970-01-01 NA
#2: 1970-01-02 2 1969-12-31,1970-01-01,1970-01-02 NA
#3: 1970-01-03 3 1970-01-01,1970-01-02,1970-01-03 6
#4: 1970-01-06 4 1970-01-04,1970-01-05,1970-01-06 10
#5: 1970-01-07 5 1970-01-05,1970-01-06,1970-01-07 12
#6: 1970-01-09 6 1970-01-07,1970-01-08,1970-01-09 16
## rollsum value by 3 date nocb
cbind(x, data.table(w_date=sapply(as.IDate(id), function(x) paste(x+((-2):0), collapse=",")), w_value=c(sum(1,1,1),sum(1,1,2),sum(1,2,3),sum(4,4,4),sum(4,4,5),sum(5,6,6))))
# date value w_date w_value
#1: 1970-01-01 1 1969-12-30,1969-12-31,1970-01-01 3
#2: 1970-01-02 2 1969-12-31,1970-01-01,1970-01-02 4
#3: 1970-01-03 3 1970-01-01,1970-01-02,1970-01-03 6
#4: 1970-01-06 4 1970-01-04,1970-01-05,1970-01-06 12
#5: 1970-01-07 5 1970-01-05,1970-01-06,1970-01-07 13
#6: 1970-01-09 6 1970-01-07,1970-01-08,1970-01-09 17
I am filling this issue as a placeholder to evaluate users demand for such feature, at present there are no plans for incorporating it, so if you would need it be sure to upvote.
Extension of #2778.
Rolling functions and
shifthas been implemented to operate on physical order of data, which means that they do not handle "gaps" in, for example, time/date fields. If one wants to shift anIDatetype vector by one day, one has to ensure that every single day is included in vector. If it isn't then one has to expand vector (or eventually a data.table) and perform shift afterwards. This can be flexibly and time efficiently solved using "rolling join" but the problem is memory consumption, especially for very sparse data. In an ideal world we would prefer to isolaterollfunctionality of rolling joins into helper function and re-use it in those cases.Some examples of expected output for input
x:Related issue tagged as data.table: https://stackoverflow.com/questions/33553230/calculate-moving-average-every-n-hours
Worth to note that pandas, as of 0.23.4, do support rolling functions by logical order when
windowargument receivedoffsetinstead ofint: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.rolling.html