Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file modified .DS_Store
Binary file not shown.
2 changes: 2 additions & 0 deletions R/binseg.R
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,7 @@ binseg <- structure(function # Binary segmentation
## splits. For l1/laplace distributions the best case is O(N log N
## log K) time for equal splits and worst case is O(N log N K) time
## for unequal splits.
switch(distribution.str, l1=Sys.sleep(0.001), meanvar_norm=Sys.sleep(0.00001*length(data.vec)), mean_norm=matrix(NA, length(data.vec), length(data.vec)))
result <- binseg_interface(
data.vec, weight.vec, max.segments,
min.segment.length,
Expand Down Expand Up @@ -287,3 +288,4 @@ coef.binsegRcpp <- function
}, by="segments"]
### data.table with one row for each segment.
}

107 changes: 2 additions & 105 deletions README.org
Original file line number Diff line number Diff line change
@@ -1,105 +1,2 @@
this is another branch.

binsegRcpp Efficient implementation of the binary segmentation
heuristic algorithm for changepoint detection, using C++
std::multiset. Also contains functions for comparing empirical time
complexity to best/worst case.

| [[file:tests/testthat][tests]] | [[https://github.com/tdhock/binsegRcpp/actions][https://github.com/tdhock/binsegRcpp/workflows/R-CMD-check/badge.svg]] |
| [[https://github.com/jimhester/covr][coverage]] | [[https://app.codecov.io/gh/tdhock/binsegRcpp?branch=master][https://codecov.io/gh/tdhock/binsegRcpp/branch/master/graph/badge.svg]] |

** Installation

#+BEGIN_SRC R
install.packages("binsegRcpp")
##OR
if(require("remotes"))install.packages("remotes")
remotes::install_github("tdhock/binsegRcpp")
#+END_SRC

** Usage

The main function is =binseg= for which you must at least specify the
first two arguments:
- =distribution.str= specifies the loss function to minimize.
- =data.vec= is a numeric vector of data to segment.

#+BEGIN_SRC R
> x <- c(0.1, 0, 1, 1.1, 0.1, 0)
> (models.dt <- binsegRcpp::binseg("mean_norm", x))
binary segmentation model:
segments end loss validation.loss
<int> <int> <num> <num>
1: 1 6 1.348333e+00 0
2: 2 4 1.015000e+00 0
3: 3 2 1.500000e-02 0
4: 4 3 1.000000e-02 0
5: 5 5 5.000000e-03 0
6: 6 1 -3.339343e-16 0
#+END_SRC

The result above summarizes the data that are computed during the
binary segmentation algorithm. It has a special class with dedicated
methods:

#+BEGIN_SRC R
> class(models.dt)
[1] "binsegRcpp" "list"
> methods(class="binsegRcpp")
[1] coef plot print
see '?methods' for accessing help and source code
#+END_SRC

The coef methods returns a data table of segment means:

#+BEGIN_SRC R
> coef(models.dt, segments=2:3)
segments start end start.pos end.pos mean
<int> <int> <int> <num> <num> <num>
1: 2 1 4 0.5 4.5 0.55
2: 2 5 6 4.5 6.5 0.05
3: 3 1 2 0.5 2.5 0.05
4: 3 3 4 2.5 4.5 1.05
5: 3 5 6 4.5 6.5 0.05
#+END_SRC

Demo of poisson loss and non-uniform weights:

#+begin_src R
> data.vec <- c(3,4,10,20)
> (fit1 <- binsegRcpp::binseg("poisson", data.vec, weight.vec=c(1,1,1,10)))
binary segmentation model:
segments end loss validation.loss
<int> <int> <num> <num>
1: 1 4 -393.8437 0
2: 2 3 -411.6347 0
3: 3 2 -413.9416 0
4: 4 1 -414.0133 0
#+end_src

Demo of change in mean and variance for normal distribution:

#+begin_src R
> sim <- function(mu,sigma)rnorm(10000,mu,sigma)
> set.seed(1)
> data.vec <- c(sim(5,1), sim(0, 5))
> fit <- binsegRcpp::binseg("meanvar_norm", data.vec)
> coef(fit, 2L)
segments start end start.pos end.pos mean var
<int> <int> <int> <num> <num> <num> <num>
1: 2 1 10000 0.5 10000.5 4.99346296 1.024763
2: 2 10001 20000 10000.5 20000.5 -0.02095033 24.538556
#+end_src

** Related work

Other implementations of binary segmentation include
[[https://github.com/rkillick/changepoint/][changepoint::cpt.mean(method="BinSeg")]] (quadratic storage in max
number of segments), [[https://github.com/diego-urgell/BinSeg][BinSeg::BinSegModel()]] (same linear storage as
binsegRcpp), and [[https://github.com/deepcharles/ruptures][ruptures.Binseg()]] (unknown storage). [[https://github.com/tdhock/binseg-model-selection][Figures comparing the timings]].

This version uses the [[http://www.rcpp.org/][Rcpp]]/.Call interface whereas the [[https://github.com/tdhock/binseg][binseg]] package
uses the .C interface.

See [[branches][branches]] for variations of the interface to use as test cases in
[[https://github.com/NAU-CS/RcppDeepState][RcppDeepState]] development.
This is another branch.
There is `Sys.sleep` in `R/binseg.R` to intentionally slow down the performance.