rolling funs / shift could support logical window

I am filling this issue as a placeholder to evaluate users demand for such feature, at present there are no plans for incorporating it, so if you would need it be sure to upvote.
Extension of #2778.

Rolling functions and `shift` has been implemented to operate on physical order of data, which means that they do not handle "gaps" in, for example, time/date fields. If one wants to shift an `IDate` type vector by one day, one has to ensure that every single day is included in vector. If it isn't then one has to expand vector (or eventually a data.table) and perform shift afterwards. This can be flexibly and time efficiently solved using "rolling join" but the problem is memory consumption, especially for very sparse data. In an ideal world we would prefer to isolate `roll` functionality of rolling joins into helper function and re-use it in those cases.
Some examples of expected output for input `x`:
```r
library(data.table)
id = c(0L,1L,2L,5L,6L,8L)
x = data.table(date=as.IDate(id), value=c(1,2,3,4,5,6))
x
#         date value
#1: 1970-01-01     1
#2: 1970-01-02     2
#3: 1970-01-03     3
#4: 1970-01-06     4
#5: 1970-01-07     5
#6: 1970-01-09     6

## shift value by 1 date
cbind(x, data.table(s_date=as.IDate(id-1), s_value=c(NA,1,2,NA,4,NA)))
#         date value     s_date s_value
#1: 1970-01-01     1 1969-12-31      NA
#2: 1970-01-02     2 1970-01-01       1
#3: 1970-01-03     3 1970-01-02       2
#4: 1970-01-06     4 1970-01-05      NA
#5: 1970-01-07     5 1970-01-06       4
#6: 1970-01-09     6 1970-01-08      NA

## shift value by 1 date locf
cbind(x, data.table(s_date=as.IDate(id-1), s_value=c(NA,1,2,3,4,5)))
#         date value     s_date s_value
#1: 1970-01-01     1 1969-12-31      NA
#2: 1970-01-02     2 1970-01-01       1
#3: 1970-01-03     3 1970-01-02       2
#4: 1970-01-06     4 1970-01-05       3
#5: 1970-01-07     5 1970-01-06       4
#6: 1970-01-09     6 1970-01-08       5

## shift value by 1 date nocb
cbind(x, data.table(s_date=as.IDate(id-1), s_value=c(1,1,2,4,4,6)))
#         date value     s_date s_value
#1: 1970-01-01     1 1969-12-31       1
#2: 1970-01-02     2 1970-01-01       1
#3: 1970-01-03     3 1970-01-02       2
#4: 1970-01-06     4 1970-01-05       4
#5: 1970-01-07     5 1970-01-06       4
#6: 1970-01-09     6 1970-01-08       6

## shift value by -1 date
cbind(x, data.table(s_date=as.IDate(id+1), s_value=c(2,3,NA,5,NA,NA)))
#         date value     s_date s_value
#1: 1970-01-01     1 1970-01-02       2
#2: 1970-01-02     2 1970-01-03       3
#3: 1970-01-03     3 1970-01-04      NA
#4: 1970-01-06     4 1970-01-07       5
#5: 1970-01-07     5 1970-01-08      NA
#6: 1970-01-09     6 1970-01-10      NA

## shift value by -1 date locf
cbind(x, data.table(s_date=as.IDate(id+1), s_value=c(2,3,3,5,5,6)))
#         date value     s_date s_value
#1: 1970-01-01     1 1970-01-02       2
#2: 1970-01-02     2 1970-01-03       3
#3: 1970-01-03     3 1970-01-04       3
#4: 1970-01-06     4 1970-01-07       5
#5: 1970-01-07     5 1970-01-08       5
#6: 1970-01-09     6 1970-01-10       6

## shift value by -1 date nocb
cbind(x, data.table(s_date=as.IDate(id+1), s_value=c(2,3,4,5,6,NA)))
#         date value     s_date s_value
#1: 1970-01-01     1 1970-01-02       2
#2: 1970-01-02     2 1970-01-03       3
#3: 1970-01-03     3 1970-01-04       4
#4: 1970-01-06     4 1970-01-07       5
#5: 1970-01-07     5 1970-01-08       6
#6: 1970-01-09     6 1970-01-10      NA

## rollsum value by 3 date
cbind(x, data.table(w_date=sapply(as.IDate(id), function(x) paste(x+((-2):0), collapse=",")), w_value=c(sum(NA,NA,1),sum(NA,1,2),sum(1,2,3),sum(NA,NA,4),sum(NA,4,5),sum(5,NA,6))))
#         date value                           w_date w_value
#1: 1970-01-01     1 1969-12-30,1969-12-31,1970-01-01      NA
#2: 1970-01-02     2 1969-12-31,1970-01-01,1970-01-02      NA
#3: 1970-01-03     3 1970-01-01,1970-01-02,1970-01-03       6
#4: 1970-01-06     4 1970-01-04,1970-01-05,1970-01-06      NA
#5: 1970-01-07     5 1970-01-05,1970-01-06,1970-01-07      NA
#6: 1970-01-09     6 1970-01-07,1970-01-08,1970-01-09      NA

## rollsum value by 3 date locf
cbind(x, data.table(w_date=sapply(as.IDate(id), function(x) paste(x+((-2):0), collapse=",")), w_value=c(sum(NA,NA,1),sum(NA,1,2),sum(1,2,3),sum(3,3,4),sum(3,4,5),sum(5,5,6))))
#         date value                           w_date w_value
#1: 1970-01-01     1 1969-12-30,1969-12-31,1970-01-01      NA
#2: 1970-01-02     2 1969-12-31,1970-01-01,1970-01-02      NA
#3: 1970-01-03     3 1970-01-01,1970-01-02,1970-01-03       6
#4: 1970-01-06     4 1970-01-04,1970-01-05,1970-01-06      10
#5: 1970-01-07     5 1970-01-05,1970-01-06,1970-01-07      12
#6: 1970-01-09     6 1970-01-07,1970-01-08,1970-01-09      16

## rollsum value by 3 date nocb
cbind(x, data.table(w_date=sapply(as.IDate(id), function(x) paste(x+((-2):0), collapse=",")), w_value=c(sum(1,1,1),sum(1,1,2),sum(1,2,3),sum(4,4,4),sum(4,4,5),sum(5,6,6))))
#         date value                           w_date w_value
#1: 1970-01-01     1 1969-12-30,1969-12-31,1970-01-01       3
#2: 1970-01-02     2 1969-12-31,1970-01-01,1970-01-02       4
#3: 1970-01-03     3 1970-01-01,1970-01-02,1970-01-03       6
#4: 1970-01-06     4 1970-01-04,1970-01-05,1970-01-06      12
#5: 1970-01-07     5 1970-01-05,1970-01-06,1970-01-07      13
#6: 1970-01-09     6 1970-01-07,1970-01-08,1970-01-09      17
```
Related issue tagged as data.table: https://stackoverflow.com/questions/33553230/calculate-moving-average-every-n-hours

---

Worth to note that pandas, as of 0.23.4, do support rolling functions by logical order when `window` argument received `offset` instead of `int`: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.rolling.html
> window : int, or offset
    Size of the moving window. This is the number of observations used for calculating the statistic. Each window will be a fixed size.
    If its an offset then this will be the time period of each window. Each window will be a variable sized based on the observations included in the time-period. This is only valid for datetimelike indexes. This is new in 0.19.0


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rolling funs / shift could support logical window #3241

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

rolling funs / shift could support logical window #3241

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions