Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
149 commits
Select commit Hold shift + click to select a range
57be801
roll dev
jangorecki Feb 7, 2018
1130953
roll dev
jangorecki Mar 16, 2018
0828724
roll-dev
jangorecki Apr 17, 2018
3644d23
dev
jangorecki Apr 17, 2018
ce91559
getting closer
jangorecki Apr 20, 2018
4edc5c8
wow, it works, and it is so fast!
jangorecki Apr 20, 2018
890e428
benchmark vs RcppRoll also
jangorecki Apr 20, 2018
eb20900
remove calling fun defined later
jangorecki Apr 20, 2018
60df73c
add rolling function roadmaps and implementation notes
jangorecki Apr 21, 2018
75ce9f4
idea only make sense when calc many windows, unlikely scenario
jangorecki Apr 21, 2018
26561ee
fix broken example for now
jangorecki Apr 21, 2018
94d3ba1
remove verbose from API, keep for dev
jangorecki Apr 21, 2018
ef7be1e
pass R check
jangorecki Apr 21, 2018
5c1730d
roll.md moved to gh issue #2778
jangorecki Apr 21, 2018
4a70a67
first roll tests, simple just to raise alert when something breaks
jangorecki Apr 21, 2018
cb4c137
add missing loading pkg
jangorecki Apr 21, 2018
1e4f630
add api for align
jangorecki Apr 21, 2018
338531b
proper tests for align
jangorecki Apr 21, 2018
4f28c14
edge cases test scenarios
jangorecki Apr 21, 2018
4fe46e2
cleanup dev script
jangorecki Apr 21, 2018
77ae3c5
support for double, int, for any other stop
jangorecki Apr 21, 2018
7aea138
simplify code by new var
jangorecki Apr 21, 2018
b3ad8a9
structure for adaptive T/F
jangorecki Apr 22, 2018
ffbfbc9
cleaning up code for n as list, edge cases unit tests
jangorecki Apr 22, 2018
04d8ef7
proper handling non-interactive tests
jangorecki Apr 22, 2018
a83255c
minor improvements to n handling
jangorecki Apr 22, 2018
55b4c33
n list check for integer types
jangorecki Apr 23, 2018
fd48ef5
roll.Rd a little more info
jangorecki Apr 23, 2018
b294988
new na.rm argument
jangorecki Apr 23, 2018
93bdfa7
fix broken roll.Rd file
jangorecki Apr 23, 2018
aa1266a
Rd title minor
jangorecki Apr 23, 2018
f64064f
put notes api difference to zoo in docs
jangorecki Apr 23, 2018
dbd6967
consistent error message about reporting issue
jangorecki Apr 24, 2018
9b5b3e5
fix roll.Rd
jangorecki Apr 24, 2018
b31a4a8
minor doc improvement
jangorecki Apr 24, 2018
2fa8081
add warning when NA propagated till resolved
jangorecki Apr 24, 2018
6062508
cleanup tests for na.rm support
jangorecki Apr 24, 2018
f45e011
proper handling of NA everytime, also minor slowdown
jangorecki Apr 24, 2018
0de759d
move inner loop to rollmeanVectorRaw, use openmp
jangorecki Apr 24, 2018
07132ca
bigger benchmark for parallel processing
jangorecki Apr 24, 2018
6faba4e
all zoo tests into requireNamespace
jangorecki Apr 24, 2018
466f2d7
assume non-NA input, new hasNA argument
jangorecki Apr 25, 2018
32004c6
rename rollmean to frollmean
jangorecki Apr 25, 2018
05bfb02
roundoff correct in-dev, adaptive in-dev
jangorecki Apr 26, 2018
31a5678
exact default FALSE, still in dev
jangorecki Apr 26, 2018
30912b0
second attempt for roundoff correction
jangorecki Apr 27, 2018
5f2574e
cleanup roundoff attempts
jangorecki Apr 27, 2018
f7b02a9
no rounding error correction for now
jangorecki Apr 28, 2018
85117ad
window of size 0 is not allowed anymore
jangorecki Apr 28, 2018
a45e3ad
adaptive TRUE first working
jangorecki Apr 28, 2018
1bcc1dc
adaptive rollmean better, but suffer badly from roundoff
jangorecki Apr 28, 2018
dc9bb7d
exact cleanup, error corrections as extra loop
jangorecki Apr 28, 2018
1ff466d
makoving child function to plain C
jangorecki May 1, 2018
63a0787
code separation for C and R C
jangorecki May 1, 2018
93c0928
dt header update
jangorecki May 1, 2018
ec780b6
moving to C, does not work yet
jangorecki May 2, 2018
2d97631
rewritten from 80dcbd0 and tracked down segfault
jangorecki May 3, 2018
671fbca
allocate storage for results outside of openmp block
jangorecki May 3, 2018
600c005
no need int64 to store number of columns
jangorecki May 3, 2018
6d63015
Revert "no need int64 to store number of columns"
jangorecki May 3, 2018
2f8cda0
roll manual formatting
jangorecki May 4, 2018
275f1dc
API for adaptive roll, extra validation
jangorecki May 4, 2018
287df4f
make C rollmean generic rollfun, added rollsum
jangorecki May 16, 2018
9f36e72
support for long vector?, probably will not work on R 3.1.0
jangorecki May 16, 2018
13d7964
long vectors supported
jangorecki May 16, 2018
3334fc4
single froll function to call C froll, not force zoo for vanilla test
jangorecki May 16, 2018
4e426a1
align dev, fixed=right now
jangorecki May 17, 2018
41ce75a
no debug msg for speed measurment
jangorecki May 18, 2018
0b369f9
align for non-adaptive, 3 smaller loops instead of 1 big, less if-ing…
jangorecki May 19, 2018
2e1a094
simplify tri-state boolean on C level
jangorecki May 20, 2018
eb6c561
verbose arg, more C variables in parallel region, less R pointers
jangorecki May 20, 2018
da143a0
window width argument in plain C
jangorecki May 20, 2018
dea6e82
verbose switch to single core
jangorecki May 20, 2018
4b5b21f
rename xrows to inx for consistency
jangorecki May 20, 2018
4fbbc1b
dev window bigger than input
jangorecki May 22, 2018
07515e6
window longer than data works also for rollsum
jangorecki May 22, 2018
249a895
use pointers instead of arrays in function declaratione
jangorecki May 22, 2018
acbf086
new arg partial, more align tests
jangorecki May 22, 2018
64b6ea0
align and na.rm works, comments cleanup, maintain in single place, mo…
jangorecki May 23, 2018
f1fca6c
rollfun now uses openmp safely, remove R dataptr in parallel region
jangorecki May 29, 2018
3e47078
nsize of array that ikl points to might have been wrong
jangorecki May 30, 2018
7c23743
rename vars for consistency, cleanup comments
jangorecki May 30, 2018
c4e0fa1
proper arrays len
jangorecki May 30, 2018
1dbb6f3
parallelize over columns AND windows
jangorecki May 30, 2018
25ffe9f
rollmean adaptive !exact and !!exact
jangorecki Jun 8, 2018
ebf056d
added support for fill argument in adaptive roll fun
jangorecki Jun 8, 2018
3f7a647
adaptive combined with fill unit tests
jangorecki Jun 8, 2018
32baef6
adaptive limitations errors moved to C
jangorecki Jun 9, 2018
089de55
adaptive na.rm dev
jangorecki Jun 10, 2018
1a366ed
adaptive moving average done with na.rm, exact support
jangorecki Jun 13, 2018
ff16001
verbose logical instead of integer
jangorecki Jun 17, 2018
3fb1de8
update roll manual
jangorecki Jun 17, 2018
666260b
rename files and function for f prefix for consistency to other _fast…
jangorecki Jun 18, 2018
146c69f
few more froll tests
jangorecki Jun 18, 2018
7b06666
remove AS_NUMERIC from C code in froll
jangorecki Jun 18, 2018
099cd60
rename manual file, and minor fix
jangorecki Jun 19, 2018
1115c95
froll n arg type check and coerce moved to C
jangorecki Jun 19, 2018
50d78ba
adaptive roll funs explicitly always
jangorecki Jun 19, 2018
d4799d0
froll fill arg coercion, avoid SEXP when not really needed
jangorecki Jun 19, 2018
8b785d5
fill arg edge case fix for NA_integer_
jangorecki Jun 19, 2018
1e70594
remove dev comments
jangorecki Jun 20, 2018
98cc531
froll adaptive support for long vector also in n arg
jangorecki Jun 20, 2018
d6b52cd
Revert "froll adaptive support for long vector also in n arg", the issue
jangorecki Jun 20, 2018
2573e0b
froll exact=T, adaptive=F, align="right"
jangorecki Jun 20, 2018
1632ffe
froll exact unit test
jangorecki Jun 20, 2018
058859b
fix exact=F partial=T for partial window
jangorecki Jun 21, 2018
96af02b
exact=T and align=any, not yet roundoff correction
jangorecki Jun 25, 2018
6bc5a8d
fix exact=F partial=T align=left/center
jangorecki Jun 25, 2018
0138dfa
fix edge case for exact=T align=left partial=T
jangorecki Jun 25, 2018
d8af6b5
minor tests comments before double->long double change for exact=F
jangorecki Jun 25, 2018
93780c3
froll exact=T roundoff correction
jangorecki Jun 28, 2018
070ca8f
docomment roundoff when exact=f and sliding window double type
jangorecki Jun 30, 2018
28491a6
use long double also for exact=F
jangorecki Jun 30, 2018
03171c4
document long double impact, twice smaller roundoff, similar time
jangorecki Jun 30, 2018
4b97c35
froll split to more functions by exact argument
jangorecki Jun 30, 2018
d3918c0
froll c funs rename
jangorecki Jun 30, 2018
df65512
refactor frollmean, drop partial argument
jangorecki Jul 1, 2018
2bfd4c5
rename frollfun to frollfunR for consistency to freadR and writeR
jangorecki Jul 1, 2018
a1c5b43
refactor frollmeanExact, drop partial argument
jangorecki Jul 2, 2018
4545ef3
formatting and less verbosity after passing tests
jangorecki Jul 2, 2018
b94ba70
tests updated, skip benchmarks
jangorecki Jul 2, 2018
d651eaa
non-finite values handling
jangorecki Jul 2, 2018
bbe709a
refactor frollmeanAdaptive and exactAdaptive
jangorecki Jul 3, 2018
71e7dfb
frollmean exact=F support Inf same as NA, NaN
jangorecki Jul 3, 2018
0e01936
remove partial arg from api of frollfun
jangorecki Jul 3, 2018
29b0e22
minor refactor for var names consistency
jangorecki Jul 3, 2018
e68310c
cleanup and extend tests for adaptive=T
jangorecki Jul 3, 2018
bdbc9ab
proper fun name in init.c
jangorecki Jul 3, 2018
78a4059
ensure cast from long double to double is after division, some tests …
jangorecki Jul 3, 2018
ff9dc56
upgrade examples for fast R ways for moving average, added sub-second…
jangorecki Jul 3, 2018
04d873b
to simplify examples remove arg for unused non-finite rm argument
jangorecki Jul 3, 2018
86782f9
Inf properly handled and documented already
jangorecki Jul 5, 2018
fce9648
improve verbose messages and test them
jangorecki Jul 5, 2018
8ef7cc1
improvement for nested parallelism to always use all cores by default
jangorecki Jul 6, 2018
a994add
code reorg, C wrappers, align uses memmove instead of memcpy, unit te…
jangorecki Nov 10, 2018
dca6d07
resolve gcc warning by initializing _nk_ value in branch rather than …
jangorecki Nov 11, 2018
0001bd6
nested parallelism proper way, omp_set_nested
jangorecki Nov 11, 2018
8d70c1d
do parallel when verbose not used, early NA stopping for frollmeanFas…
jangorecki Nov 11, 2018
486aec6
cleanup and extends froll tests
jangorecki Nov 11, 2018
2fbf2c8
remove unimplemented frollsum, exact turned into algo argument for algos
jangorecki Nov 11, 2018
684d188
fix segfault, update verbose msgs, cumsum based rollmean checked: cou…
jangorecki Nov 11, 2018
295ad94
update manual long vector test, no longer segfault
jangorecki Nov 12, 2018
70208d9
more tests to improve code coverage
jangorecki Nov 15, 2018
1fdccb6
Merge branch 'master' into roll
mattdowle Dec 6, 2018
62693fc
confirm memory allocated, add NEWS entry
jangorecki Dec 6, 2018
73a5f9c
exception handling in parallel regions
jangorecki Dec 6, 2018
5a7bc0d
clarify verbose message about parallel execution
jangorecki Dec 7, 2018
444404d
fix compiler warning on improper initializatin of struct
jangorecki Dec 7, 2018
31dca17
Merge branch 'master' into roll
mattdowle Dec 14, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ export(rollup)
S3method(groupingsets, data.table)
S3method(cube, data.table)
S3method(rollup, data.table)
export(frollmean)

S3method("[", data.table)
S3method("[<-", data.table)
Expand Down
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@

8. `DT[..., .SDcols=]` now accepts `patterns()`; e.g. `DT[..., .SDcols=patterns("^V")]`, for filtering columns according to a pattern (as in `melt.data.table`), [#1878](https://github.com/Rdatatable/data.table/issues/1878). Thanks to many people for pushing for this and @MichaelChirico for ultimately filing the PR. See `?data.table` for full details and examples.

9. New `frollmean` has been added to calculate _rolling mean_. Function name and arguments are experimental. Related to [#2778](https://github.com/Rdatatable/data.table/issues/2778) (and [#624](https://github.com/Rdatatable/data.table/issues/624), [#626](https://github.com/Rdatatable/data.table/issues/626), [#1855](https://github.com/Rdatatable/data.table/issues/1855)). Other rolling statistics will follow.


#### BUG FIXES

Expand Down
11 changes: 11 additions & 0 deletions R/froll.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
froll <- function(fun, x, n, fill=NA, algo=c("fast", "exact"), align=c("right", "left", "center"), na.rm=FALSE, hasNA=NA, adaptive=FALSE, verbose=getOption("datatable.verbose")) {
stopifnot(!missing(fun), is.character(fun), length(fun)==1L, !is.na(fun))
algo = match.arg(algo)
align = match.arg(align)
ans = .Call(CfrollfunR, fun, x, n, fill, algo, align, na.rm, hasNA, adaptive, verbose)
ans
}

frollmean <- function(x, n, fill=NA, algo=c("fast", "exact"), align=c("right", "left", "center"), na.rm=FALSE, hasNA=NA, adaptive=FALSE, verbose=getOption("datatable.verbose")) {
froll(fun="mean", x=x, n=n, fill=fill, algo=algo, align=align, na.rm=na.rm, hasNA=hasNA, adaptive=adaptive, verbose=verbose)
}
147 changes: 147 additions & 0 deletions man/froll.Rd
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
\name{roll}
\alias{roll}
\alias{froll}
\alias{rolling}
\alias{sliding}
\alias{moving}
\alias{frollmean}
\alias{frollsum}
\title{Rolling functions}
\description{
Fast rolling functions to calculate aggregates on sliding window.
}

\usage{
frollmean(x, n, fill=NA, algo=c("fast", "exact"), align=c("right",
"left", "center"), na.rm=FALSE, hasNA=NA, adaptive=FALSE,
verbose=getOption("datatable.verbose"))
}
\arguments{
\item{x}{ vector, list, data.frame or data.table of numeric fields. }
\item{n}{ integer vector, for adaptive rolling function also list of
integer vectors, rolling window size. }
\item{fill}{ numeric, value to pad by, default \code{NA}. }
\item{algo}{ character, default \code{"fast"}. When set to \code{"exact"}
then slower algorithm is used. It suffers less from floating point
rounding error, perform extra pass to adjust rounding error
correction and carefully handle all non-finite values. If available
it will use multiple cores. See details for more information. }
\item{align}{ character, define if window frame covers preceding rows
\code{"right"}, following rows \code{"left"} or centered
\code{"center"}, default \code{"right"}. }
\item{na.rm}{ logical, should missing values be removed when
calculating window, default \code{FALSE}. For details on handling
other non finite values see details below. }
\item{hasNA}{ logical, if it is known that \code{x} contains \code{NA}
then setting to \code{TRUE} will speed up, default \code{NA}. }
\item{adaptive}{ logical, should adaptive rolling function be
calculated, default \code{FALSE}. See details below. }
\item{verbose}{ logical, default \code{getOption("datatable.verbose")},
\code{TRUE} turns on status and information messages to the console,
it also disable parallel processing. }
}
\details{
\code{froll*} functions accepts vectors, lists, data.frames or
data.tables. They always returns a list except when the input is a
\code{vector} and \code{length(n)==1} in which case a \code{vector}
is returned, for convenience. This is so that it can be used
conveniently within data.table's syntax.

Argument \code{n} allows multiple values to calculate multiple rolling
windows or if \code{adaptive=TRUE} then it expects a list, each list
element must be integer vector of window size corresponding to every
\code{column[row]} from \code{x}.

When \code{algo="fast"} is used then any \code{NaN, +Inf, -Inf} is
treated as \code{NA}. For precise handling of non-finite values use
\code{algo="exact"}.
Argument \code{algo="exact"} will make rolling functions to perform extra
computation for floating point rounding error correction. This is useful
mostly when when input data has distant outlier. It also handles
\code{NaN, +Inf, -Inf} consistently to base R.

Adaptive rolling functions are special cases where for each single
observation has own corresponding rolling window width. Due to the logic
of that function following restrictions apply:
\itemize{
\item{ \code{align} only \code{"right"}. }
\item{ if list of integer vectors is passed to \code{x} then all
list vectors must have equal length. }
}

When multiple columns or multiple windows width are provided then they
are run in parallel. Eventually nested parallelism occurs when
\code{algo="exact"}, see examples.
}
\value{
A list except when the input is a \code{vector} and
\code{length(n)==1} in which case a \code{vector} is returned.
}
\note{
Users coming from most popular package for rolling functions
\code{zoo} might expect following difference in \code{data.table}
implementation.
\itemize{
\item{ rolling function will always return same length of results
as provided input. }
\item{ \code{fill} by default \code{NA}. }
\item{ \code{fill} accept only constant values, no support for
\emph{na.locf} or other functions. }
\item{ \code{align} is by default \code{"right"}. }
\item{ \code{na.rm} is respected, no need to use other function
when having \code{NA} values. }
\item{ integers are always coerced to double. }
\item{ when \code{adaptive=FALSE} (default) then \code{n} must be a
numeric vector, list is not accepted. }
\item{ when \code{adaptive=TRUE} then \code{n} must be vector of
length equal to \code{nrow(x)}, or list of such vectors. }
\item{ there is no \code{partial} window support. }
}
}
\examples{
d = as.data.table(list(1:6/2, 3:8/4))
# rollmean of single vector and single window
frollmean(d[, V1], 3)
# multiple columns at once
frollmean(d, 3)
# multiple windows at once
frollmean(d[, .(V1)], c(3, 4))
# multiple columns and multiple windows at once
frollmean(d, c(3, 4))
## three above are embarrassingly parallel using openmp

# performance vs exactness
set.seed(108)
x = sample(c(rnorm(1e3, 1e6, 5e5), 5e9, 5e-9))
n = 15
ma = function(x, n, na.rm=FALSE) {
ans = rep(NA_real_, nx<-length(x))
for (i in n:nx) ans[i] = mean(x[(i-n+1):i], na.rm=na.rm)
ans
}
fastma = function(x, n, na.rm) {
if (!missing(na.rm)) stop("NAs are unsupported, wrongly propagated by cumsum")
cs = cumsum(x)
scs = shift(cs, n)
scs[n] = 0
as.double((cs-scs)/n)
}
system.time(ans1<-ma(x, n))
system.time(ans2<-fastma(x, n))
system.time(ans3<-frollmean(x, n, algo="exact")) # parallel using openmp again
system.time(ans4<-frollmean(x, n))
anserr = list(
froll_exact_f = ans4-ans1,
froll_exact_t = ans3-ans1,
fastma = ans2-ans1
)
errs = sapply(lapply(anserr, abs), sum, na.rm=TRUE)
sapply(errs, format, scientific=FALSE) # roundoff
}
\seealso{
\code{\link{shift}}, \code{\link{data.table}}
}
\references{
\href{Round-off error}{https://en.wikipedia.org/wiki/Round-off_error}
}
\keyword{ data }
13 changes: 13 additions & 0 deletions src/data.table.h
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
#include <stdint.h> // for uint64_t rather than unsigned long long
#include <stdbool.h>
#include "myomp.h"
#include "types.h"

// data.table depends on R>=3.0.0 when R_xlen_t was introduced
// Before R 3.0.0, RLEN used to be switched to R_len_t as R_xlen_t wasn't available.
Expand Down Expand Up @@ -152,3 +153,15 @@ double wallclock();
int getDTthreads();
void avoid_openmp_hang_within_fork();

// froll.c
void frollmean(unsigned int algo, double *x, uint_fast64_t nx, double_ans_t *ans, int k, int align, double fill, bool narm, int hasna, bool verbose);
void frollmeanFast(double *x, uint_fast64_t nx, double_ans_t *ans, int k, double fill, bool narm, int hasna, bool verbose);
void frollmeanExact(double *x, uint_fast64_t nx, double_ans_t *ans, int k, double fill, bool narm, int hasna, bool verbose);

// frolladaptive.c
void fadaptiverollmean(unsigned int algo, double *x, uint_fast64_t nx, double_ans_t *ans, int *k, double fill, bool narm, int hasna, bool verbose);
void fadaptiverollmeanFast(double *x, uint_fast64_t nx, double_ans_t *ans, int *k, double fill, bool narm, int hasna, bool verbose);
void fadaptiverollmeanExact(double *x, uint_fast64_t nx, double_ans_t *ans, int *k, double fill, bool narm, int hasna, bool verbose);

// frollR.c
SEXP frollfunR(SEXP fun, SEXP obj, SEXP k, SEXP fill, SEXP algo, SEXP align, SEXP narm, SEXP hasNA, SEXP adaptive, SEXP verbose);
Loading