Skip to content

LinuxpowerLudo/Changepoints

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bayesian online change point detection

An R implementation of the paper: Bayesian Online Changepoint Detection(Ryan Prescott Adams and David J.C. MacKay 2007), with derivation process explanations, additional conjugate distributions and examples, :

Usage

Load the function onlinechangepoint() directly from source:

source("https://raw.githubusercontent.com/chenhaotian/Changepoints/master/online_detection/bayesian.r") 

Parameters:

args(onlinechangepoint) 
  • X : numeric, observations.
  • model : character, specifying model to be used, should be one of c("nng","pg","bb","g")
    • nng: normal evidence(observation distribution) and normal-gamma prior
    • pg: poisson evidence and gamma prior
    • bb: binomial evidence and beta prior
    • g: gamma evidence
  • mu0, k0, alpha0, beta0 : numeric, specifying hyper parameters.
    • mu0, k0, alpha0, beta0: normal-gamma parameter, when model="nng"
    • alpha0, beta0: gamma parameters when model="pg"(these two names alpha0 and beta0 are shared with "nng")
    • alpha0, beta0: beta parameters when model="bb"
  • lambda : numeric, parameter of the exponential hazard function.(act as transition distribution)
  • FILTER : if P(r_t|x_{1:t})<FILTER, this r_t will be omitted in next round of online calculation.

See Examples below for details.

Derivation

Adams&Mackay's paper interpret the change point detection problem with an infinite state hidden markov model. Hereby the definitions:

  • is the time length since last changepoint, denote the data sets associated with run . As may be zero, may be empty.The hidden states are the run lengths of current observation, they evolve as the following way:
  • For each parition , the data within it are i.i.d. from some probability distribution
  • The discrete a priori(Latin phrase, means 'from the earlier',contrast to a posteriori,'from the latter') probability distribution over the interval between changepoints is denoted as , i.e. the probability of current run ending up at length .(recall geometric distribution)
  • Predictive distribution Posterior distribution Joint distribution

With the above definitions, our goal here is to recursively(use recusive methods for the purpose of online calculation) compute the probability distribution of the length of the current “run”, or time since the last changepoint, denote this posterior distribution as According to bayesian therom,

we can devide the calculation of the conditional distribution into two subproblems: 1. Calculate the joint distributioin . 2. Calculate the normalizing constant . As for subproblem 2, can be easily calculated by integrate out in subproblem 1. i.e. . So basically we need only focus on subproblem 1, the derivation of . Keep in mind that to make the online calculation possible, the derived represntation of subproblem 1 must be recursive. i.e. when the previous information is known, one need to derive a relation between previous information and current estimation. Here in subproblem 1, the previous information is , and the current estimation is . In order to relate these two distributions, first we need to disintegrate by , and then use bayes rule to extract :

Remark:

  • By definition, . Or more precisely, is the last (may be zero) elements of .
  • relies only on , i.e. . Intuitively, if and belong to the same partition, then the prediction probability of is , where is determined by , so we have . On the other hand, if belongs to a new partition, then is irrevelant to , the equation still holds:
  • only depends on .

According to , subproblem 1 is further devided into 3 minus subproblems: (a). Construct a boundary condition, or the initial distribution of . When , , note that is an empty set, so the problem can be simplified to the construction of . (b). Construct a transition distribution (c). Construct a prediction distribution As to (a), the simplist way is to assume a changepoint occured before the first data. In such cases we place all of the probability mass for the initial run length at zero, i.e. . As to (b), has non-zero mass at only two outcomes: the run length either continues to grow and or a changepoint occurs and . This reasoning process is the same as the one used in inferring motality rate, which can be easily interpretated by a hazard function(). Here we have:

Where

Any prior information about run length transitions can be easily incoporated by the form of . For example, if is an exponential(or geometric, if discrete) distribution with timesacle , then the hazard function is constant(independent of previous run length), . It is the same as making no prior assumptions on run length transitions. As to (c). Recall that in bayesian point of view, if every data point subject to a distribution , and subject to a prior distribution , where is the hyperparameters. Then the posterior distribution of can be shown satisfying . Here in this problem. let be the parameter inferred from , then we have:

And the prediction distribution can be easily calculated when disintegrated by (bayesian prediction problem):

So far we have only one problem left, i.e. the derivation of . There are three parts in , they are the likelihood function (or the sample distribution , the two are technically implying the same thing), the prior distribution and the normalizing constant . Because the normalizing constant is already dealt in subproblem 2, here we only need to find a proper likelihood function and a prior distribution. Generally, we need to calculate the posterior distribution of directly from , but sometimes it would be vary time consuming. Particularly, if both likelihood and prior distribution are from exponential family, the posterior distribution will also from exponential family, and allow inference with a finite number of sufficient statistics which can be calculated incrementally as data arrives. Here is a general case where there is a normal ikelihood function with unkonwn mean and precision (or inverse of variance,). Usually, if is known, follows a noramal distribution with mean and precision , and if is known, follows a gamma distribution with shape and rate(1/sclale) . When both and are unknown, it is usual to assme they follow a normal-gamma distribution with parameters and :

see exponettial family and Conjugate Bayesian analysis of the Gaussian distribution for more information. ** Note**

  1. In the above example c(,) and c(,) act the same as effective number of observations and the amount contribute to the sufficient statistic repestectively.
  2. In the bayesian prediction problem , calculate the integration directly will be too time consuming. Alternatively, we can use the expected value of instead of the whole distribution to approximate the probability. i.e.

Examples

Here are several examples in applying the algorithm.

1. Level change(normal observation):

Different mean and same volatility.

## Generate random samples with real changepoints: 100,170,240,310,380 
X <- c(rnorm(100,sd=0.5),rnorm(70,mean=5,sd=0.5),rnorm(70,mean = 2,sd=0.5),rnorm(70,sd=0.5),rnorm(70,mean = 7,sd=0.5)) 
## online changepoint detection for series X 
resX <- onlinechangepoint(X, 
                          model = "nng", 
                          mu0=0.7,k0=1,alpha0=1/2,beta0=1, #initial parameters 
                          bpmethod = "mean", 
                          lambda=50, #exponential hazard 
                          FILTER=1e-3) 
tmpplot(resX,X) # visualize original data and run length distribution 

The last statement tmpplot(rexX,X) produce a graph comparing the original data(upper part) and the run lengthes(lower part):

2. Volatility change(normal observation):

Different volatility and same mean.

## Generate random samples with real changepoints: 100,170,240,310,380. 
Y <- c(rnorm(100,sd=0.5),rnorm(70,sd=1),rnorm(70,sd=3),rnorm(70,sd=1),rnorm(70,sd=0.5)) 
## online changepoint detection for series Y 
resY <- onlinechangepoint(Y, 
                          model = "nng", 
                          mu0=0,k0=0.5,alpha0=1/2,beta0=1, 
                          bpmethod = "mean", 
                          lambda=50, #exponential hazard 
                          FILTER=1e-3) 
tmpplot(resY,Y) 

Original data and inferred run length distribution:

3. Level change(poisson observation):
Z <- c(rpois(100,lambda=5),rpois(70,lambda=7),rpois(70,lambda=5),rpois(70,lambda=6),rpois(70,lambda=5)) 
## online changepoint detection for series Z 
resZ <- onlinechangepoint(Z, 
                          model = "pg", 
                          alpha0=10,beta0=1, 
                          bpmethod = "mean", 
                          lambda=10, #exponential hazard 
                          FILTER=1e-3) 
tmpplot(resZ,Z) 

Original data and inferred run length distribution:

Reference

[1] Bayesian Online Changepoint Detection

[2] Checking wether a coin is fair

[3] DeGroot,Optimal Statistical Decisions, chapter 9

[4] Gamma distribution

[5] Fink, D. 1995 A Compendium of Conjugate Priors. In progress report: Extension and enhancement of methods for setting data quality objectives. (DOE contract 95‑831).

About

Bayesian online change point detection

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • R 100.0%