An R implementation of the paper: Bayesian Online Changepoint Detection(Ryan Prescott Adams and David J.C. MacKay 2007), with derivation process explanations, additional conjugate distributions and examples, :
- Basic idea and function usage
- Derivation process explanations
- Normal evidence distribution and normal-gamma prior + examples
- Poisson evidence distribution and gamma prior + examples
- Binomial evidence distribution and beta prior + examples
- Additional readings
Load the function onlinechangepoint() directly from source:
source("https://raw.githubusercontent.com/chenhaotian/Changepoints/master/online_detection/bayesian.r") Parameters:
args(onlinechangepoint) - X : numeric, observations.
- model : character, specifying model to be used, should be one of c("nng","pg","bb","g")
- nng: normal evidence(observation distribution) and normal-gamma prior
- pg: poisson evidence and gamma prior
- bb: binomial evidence and beta prior
- g: gamma evidence
- mu0, k0, alpha0, beta0 : numeric, specifying hyper parameters.
- mu0, k0, alpha0, beta0: normal-gamma parameter, when model="nng"
- alpha0, beta0: gamma parameters when model="pg"(these two names alpha0 and beta0 are shared with "nng")
- alpha0, beta0: beta parameters when model="bb"
- lambda : numeric, parameter of the exponential hazard function.(act as transition distribution)
- FILTER : if
P(r_t|x_{1:t})<FILTER, thisr_twill be omitted in next round of online calculation.
See Examples below for details.
Adams&Mackay's paper interpret the change point detection problem with an infinite state hidden markov model. Hereby the definitions:
is the time length since last changepoint,
denote the data sets associated with run
. As
may be zero,
may be empty.The hidden states are the run lengths
of current observation, they evolve as the following way:

- For each parition
, the data within it are i.i.d. from some probability distribution 
- The discrete a priori(Latin phrase, means 'from the earlier',contrast to a posteriori,'from the latter') probability distribution over the interval between changepoints is denoted as
, i.e. the probability of current run ending up at length
.(recall geometric distribution) - Predictive distribution
Posterior distribution
Joint distribution 
With the above definitions, our goal here is to recursively(use recusive methods for the purpose of online calculation) compute the probability distribution of the length of the current “run”, or time since the last changepoint, denote this posterior distribution as
According to bayesian therom,
we can devide the calculation of the conditional distribution into two subproblems:
1. Calculate the joint distributioin
.
2. Calculate the normalizing constant
.
As for subproblem 2,
can be easily calculated by integrate out
in subproblem 1. i.e.
. So basically we need only focus on subproblem 1, the derivation of
.
Keep in mind that to make the online calculation possible, the derived represntation of subproblem 1 must be recursive. i.e. when the previous information is known, one need to derive a relation between previous information and current estimation. Here in subproblem 1, the previous information is
, and the current estimation is
. In order to relate these two distributions, first we need to disintegrate
by
, and then use bayes rule to extract
:
Remark:
- By definition,
. Or more precisely,
is the last
(may be zero) elements of
.
relies only on
, i.e.
. Intuitively, if
and
belong to the same partition, then the prediction probability of
is
, where
is determined by
, so we have
. On the other hand, if
belongs to a new partition, then
is irrevelant to
, the equation still holds: 
only depends on
.
According to
, subproblem 1 is further devided into 3 minus subproblems:
(a). Construct a boundary condition, or the initial distribution of
. When
,
, note that
is an empty set, so the problem can be simplified to the construction of
.
(b). Construct a transition distribution
(c). Construct a prediction distribution
As to (a), the simplist way is to assume a changepoint occured before the first data. In such cases we place all of the probability mass for the initial run length at zero, i.e.
.
As to (b),
has non-zero mass at only two outcomes: the run length either continues to grow and
or a changepoint occurs and
. This reasoning process is the same as the one used in inferring motality rate, which can be easily interpretated by a hazard function(
). Here we have:
Where
Any prior information about run length transitions can be easily incoporated by the form of
. For example, if
is an exponential(or geometric, if discrete) distribution with timesacle
, then the hazard function is constant(independent of previous run length),
. It is the same as making no prior assumptions on run length transitions.
As to (c). Recall that in bayesian point of view, if every data point
subject to a distribution
, and
subject to a prior distribution
, where
is the hyperparameters. Then the posterior distribution of
can be shown satisfying
. Here in this problem. let
be the parameter inferred from
, then we have:
And the prediction distribution can be easily calculated when disintegrated by
(bayesian prediction problem):
So far we have only one problem left, i.e. the derivation of
. There are three parts in
, they are the likelihood function
(or the sample distribution
, the two are technically implying the same thing), the prior distribution
and the normalizing constant
. Because the normalizing constant is already dealt in subproblem 2, here we only need to find a proper likelihood function and a prior distribution.
Generally, we need to calculate the posterior distribution of
directly from
, but sometimes it would be vary time consuming. Particularly, if both likelihood and prior distribution are from exponential family, the posterior distribution will also from exponential family, and allow inference with a finite number of sufficient statistics which can be calculated incrementally as data arrives.
Here is a general case where there is a normal ikelihood function with unkonwn mean
and precision
(or inverse of variance,
). Usually, if
is known,
follows a noramal distribution with mean
and precision
, and if
is known,
follows a gamma distribution with shape
and rate(1/sclale)
. When both
and
are unknown, it is usual to assme they follow a normal-gamma distribution with parameters
and
:
see exponettial family and Conjugate Bayesian analysis of the Gaussian distribution for more information. ** Note**
- In the above example c(
,
) and c(
,
) act the same as effective number of observations
and the amount contribute to the sufficient statistic
repestectively. - In the bayesian prediction problem
, calculate the integration
directly will be too time consuming. Alternatively, we can use the expected value of
instead of the whole distribution to approximate the probability. i.e.
Here are several examples in applying the algorithm.
Different mean and same volatility.
## Generate random samples with real changepoints: 100,170,240,310,380
X <- c(rnorm(100,sd=0.5),rnorm(70,mean=5,sd=0.5),rnorm(70,mean = 2,sd=0.5),rnorm(70,sd=0.5),rnorm(70,mean = 7,sd=0.5))
## online changepoint detection for series X
resX <- onlinechangepoint(X,
model = "nng",
mu0=0.7,k0=1,alpha0=1/2,beta0=1, #initial parameters
bpmethod = "mean",
lambda=50, #exponential hazard
FILTER=1e-3)
tmpplot(resX,X) # visualize original data and run length distribution The last statement tmpplot(rexX,X) produce a graph comparing the original data(upper part) and the run lengthes(lower part):

Different volatility and same mean.
## Generate random samples with real changepoints: 100,170,240,310,380.
Y <- c(rnorm(100,sd=0.5),rnorm(70,sd=1),rnorm(70,sd=3),rnorm(70,sd=1),rnorm(70,sd=0.5))
## online changepoint detection for series Y
resY <- onlinechangepoint(Y,
model = "nng",
mu0=0,k0=0.5,alpha0=1/2,beta0=1,
bpmethod = "mean",
lambda=50, #exponential hazard
FILTER=1e-3)
tmpplot(resY,Y) Original data and inferred run length distribution:

Z <- c(rpois(100,lambda=5),rpois(70,lambda=7),rpois(70,lambda=5),rpois(70,lambda=6),rpois(70,lambda=5))
## online changepoint detection for series Z
resZ <- onlinechangepoint(Z,
model = "pg",
alpha0=10,beta0=1,
bpmethod = "mean",
lambda=10, #exponential hazard
FILTER=1e-3)
tmpplot(resZ,Z) Original data and inferred run length distribution:

[1] Bayesian Online Changepoint Detection
[2] Checking wether a coin is fair







