Multiple Change Point Detection

Hi there,

I am trying to implement a multiple Change Point Detection using stan based on the website descriped the concept in stan:
https://nowave.it/pages/bayesian-changepoint-detection-with-r-and-stan.html

The model is well predict for one change point, however when I try to add more change point for example just two change point, then the prediction seems is not accurate. I am not sure if I am made mistake on constructing the Likelihood and the Generate Quantities block for using softmax command. I am trying to use the Dynamic Programming to increase the speed of the simulation as the change point increase. I did all in 1D vector format step by step and then try to append the vectors to one vector to construct the final likelihhod. here is the code:


beta0 <- 3
beta1 <- 9
beta2 <- 15

set.seed(42)
x <- sort(runif(100, 0, 10))
x1 <- head(x, 41)
x2 <- tail(x, 59)
y1 <- rnorm(41, mean=beta0 + beta1 * x1, sd=4.3)
y2 <- rnorm(59, mean=beta0 + beta2 * x2, sd=6.3)

y <- c(y1, y2)
plot(x, y)


stan_code2 <- '
data {
  int<lower=1> N;
  real x[N];
  real y[N];
}
parameters {
  real beta0;
  real beta1;
  real beta2;
  real beta3;

  real<lower=0> sigma;
}
transformed parameters {
  vector[N] mu1;
  vector[N] mu2;
  vector[N] mu3;

  vector[N] log_p;
  vector[N] log_p2;

  {
    vector[N+1] log_p_e;
    vector[N+1] log_p_l;
    vector[N+1] log_p_m;

    log_p_e[1] = 0;
    log_p_l[1] = 0;
    log_p_m[1] = 0;

    for (i in 1:N) {
      mu1[i] = beta0 + beta1 * x[i];
      mu2[i] = beta0 + beta2 * x[i];
      mu3[i] = beta0 + beta3 * x[i];

      log_p_e[i + 1] = log_p_e[i] + normal_lpdf(y[i] | mu1[i], sigma);
      log_p_l[i + 1] = log_p_l[i] + normal_lpdf(y[i] | mu2[i], sigma);
      log_p_m[i + 1] = log_p_m[i] + normal_lpdf(y[i] | mu3[i], sigma);

    }
    log_p = rep_vector(-log(N) + log_p_l[N + 1], N) + head(log_p_e, N) - head(log_p_l, N);
    log_p2 = rep_vector(-log(N) + log_p_m[N + 1], N) + head(log_p_l, N) - head(log_p_m, N);
    
  }
}
model {
  beta0 ~ normal(0, 100);
  beta1 ~ normal(0, 100);
  beta2 ~ normal(0, 100);
  beta3 ~ normal(0, 100);

  sigma ~ normal(0, 100);
  
  target += log_sum_exp(append_row(log_p, log_p2));
}
generated quantities {
  int<lower=1,upper=N> tau;
  int<lower=1,upper=N> tau2;

  // K simplex are a data type in Stan
  simplex[N] sp;
  simplex[N] sp2;

  sp = softmax(log_p);
  tau = categorical_rng(sp);
  
  sp2 = softmax(log_p2);
  tau2 = categorical_rng(sp2);
}
'

fit <- stan(
  model_code = stan_code2,
  data = list(x = x, y = y, N=length(x)),
  chains = 4,
  iter = 10000,
  cores = 4,
  refresh = 500
)

plot(fit, pars=c("beta0", "beta1", "beta2","beta3","sigma"))

plot(fit, pars=c("tau","tau2"))

I will be thankful if anyone give me the idea to solve the issue. I am guessing that the model must find the tau and tau2 both around 42 as its only one change point in the sample data. but it gaves me other values.

Sorry I missed this one earlier, as I’m the one who wrote the change-point section in the Stan User’s Guide.

To extend to two change points, you just need to log_sum_exp over all \binom{N}{2} choices for where the change points go. Generally, I would recommend getting things working on a small data set first, then trying to code the dynamic programming. But I think the dynamic programming you’ve set up for log_p_e[k] to be the probability of data points 1 to k using the first parameters, log_p_l[k] is the probability of the data points 1 to k using the middle parameterization, and same for log_p_m[k] for the final set of parameters. This should all be done with a function to avoid all the duplicated code. And you can just use cumulative sum here.

vector prefix_densities(vector y, real mu, real sigma) {
  int N = rows(y);
  vector[N] lp;
  for (n in 1:N) {
    lp[n] = normal_lpdf(y[n] | mu, sigma);
  }
  return cumulative_sum(lp);
}

This could be made more efficient still by unfolding the normal_lpdf calls into

vector[N] lp = -0.5 * ((y - mu) / sigma)^2 - log(sigma);

but I would strongly urge saving this kind of optimization until after everything else is working.

Then in the program, you have

vector[N] logp1 = prefix_densities(y, mu[1], sigma[1]);
vector[N] logp2 = prefix_densities(y, mu[2], sigma[2]);
vector[N] logp3 = prefix_densities(y, mu[3], sigma[3]);

and then assuming you want the two cut points to be in 1:N, then you can do this:

array[N * (N - 1) / 2] lp;
int pos = 1;
for (i in 1:N) {
  for (j in (i+1):N) {
    lp[pos] = logp1[i] + (logp2[j] - logp2[i]) + (logp3[N] - log_p_m[j]);
    pos += 1
  }
}
target += log_sum_exp(lp);

You’ll want to check my algebra, but this is the rough idea. If you want to allow the cut points to be 0 or N + 1, then it’s a bit trickier as you have to deal with all the edge conditions.

You should expect to get a distribution over change points. And the problem with these kinds of mixture models is that they typically have multiple solutions. You can try running with optimization if you want to get the maximum (penalized) likelihood estimate.

1 Like

Dear Bob, I really apreciate your time that descriping the comcept in a very details. This is very clear now for me. Is that possible if you kindly add this valuable notes in the stan manual as well? becasue I see a lot of other similar question in the same topic for the multiple change point model in this forum. I belive that these description that you just written here will be very helpful for all other researcher working on the change point model…