Simpler Change Point Models?


#1

Using the example in the Stan manual of marginalising out the discrete time parameter, I created a change point model for UK / South Korean trade (details here if you are interested). @rabaath suggested modelling time as continuous. Doing so results in a simpler model and to me makes more sense (to me at any rate) as time really is continuous.

I attach the Stan below. I am happy to create a PR for the Stan manual but I don’t wish to put effort into doing this if it’s not a good idea.

data {
  int<lower=1> N;
  vector[N] x;
  vector[N] y;
}

parameters {
  real<lower=0, upper=N+1> tau;
  real mu1;
  real mu2;
  real gamma1;
  real gamma2;

  real<lower=0> sigma1;
  real<lower=0> sigma2;
}

model {
  real mu;
  real gamma;
  real sigma;
  real weight;

  mu1 ~ normal(0, 10);
  mu2 ~ normal(0, 10);
  gamma1 ~ normal(0, 10);
  gamma2 ~ normal(0, 10);
  sigma1 ~ normal(0, 10);
  sigma2 ~ normal(0, 10);

  for (i in 1:N) {
    weight = inv_logit(2 * (i - tau));

    mu = weight * mu2 + (1 - weight) * mu1;
    gamma = weight * gamma2 + (1 - weight) * gamma1;
    sigma = weight * sigma2 + (1 - weight) * sigma1;

    y[i] ~ normal(mu * x[i] + gamma, sigma);
  }
}

#2

Please don’t remove the change-point model as coded in the manual. It’s meant only as a simple example of

  • how to carry out the marginalization, and
  • how to recover discrete quantities of interest

Feel free to make a pull request to the manual that adds this a parenthetical alternative. You should run it by Andrew first to get sign off on the model—he has strong opinions on how to code such models and every time anyone brings up change-point models he suggests making them continuous.

You can recode the model more efficiently and in much less code as

functions {
  vector interp(vector w, vector wc, vector a) {
    return w * a[1] + wc * a[2];
  }
}
data {
  int<lower=1> N;
  vector[N] x;
  vector[N] y;
}
transformed data {
  vector[N] steps;
  for (n in 1:N) steps[n] = 2 * n;
}
parameters {
  real<lower=0, upper=N+1> tau;
  vector[2] mu;
  vector[2] gamma;
  vector<lower=0>[2] sigma;
}
model {
  vector[N] w = inv_logit(steps - 2 * tau);
  vector[N] wc = 1 - w;

  y ~ normal(x .* interp(w, wc, mu) + interp(w, wc, gamma),
             interp(w, wc, sigma));

  mu ~ normal(0, 10);
  gamma ~ normal(0, 10);
  sigma ~ normal(0, 10);
}

#3

With this I get a missing “;” error but afaics all semi-colons are present and correct.

> fitMC <- stan(
+     file = "lr-changepoint-cont.stan",
+     data = list(x = x, y = y, N = length(y)),
+     chains = 4,
+     warmup = 1000,
+     iter = 10000,
+     cores = 4,
+     refresh = 500,
+     seed=42
+ )
SYNTAX ERROR, MESSAGE(S) FROM PARSER:


ERROR at line 69

 68:    model {
 69:      vector[N] w = inv_logit(steps - 2 * tau);
                      ^
 70:      vector[N] wc = 1 - w;

PARSER EXPECTED: ";"
Error in stanc(file = file, model_code = model_code, model_name = model_name,  : 
  failed to parse Stan model 'lr-changepoint-cont' due to the above error.
> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: OS X El Capitan 10.11.5

locale:
[1] en_GB/en_GB/en_GB/C/en_GB/en_GB

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] coda_0.18-1          zoo_1.7-13           rstan_2.15.1        
[4] StanHeaders_2.15.0-1 ggplot2_2.2.1       

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.8      codetools_0.2-15 lattice_0.20-34  assertthat_0.1  
 [5] grid_3.3.2       plyr_1.8.4       gtable_0.2.0     stats4_3.3.2    
 [9] scales_0.4.1     lazyeval_0.2.0   labeling_0.3     tools_3.3.2     
[13] munsell_0.4.3    compiler_3.3.2   parallel_3.3.2   inline_0.3.14   
[17] colorspace_1.3-1 gridExtra_2.2.1  tibble_1.3.3    

#4

I restarted R and the problem has gone away. It will forever remain one of life’s great unsolved mysteries.


#5

Please don’t remove the change-point model as coded in the manual. It’s meant only as a simple example of how to carry out the marginalization, and how to recover discrete quantities of interest

I wasn’t proposing to remove anything only to point out it seemed simpler (to me at any rate) to treat time as continuous.

Feel free to make a pull request to the manual that adds this a parenthetical alternative. You should run it by Andrew first to get sign off on the model—he has strong opinions on how to code such models and every time anyone brings up change-point models he suggests making them continuous.

What is the best way of running it past Andrew? I imagine he gets hundreds of emails a day.


#6

Email. As far as I can tell, Andrew doesn’t monitor Discourse topics or GitHub issues or pull requests. If you don’t get a response, let me know, and I can bug him in person.


#7

Although time is continuous, when the data is collected the observation time (or time interval) is discretized. There are cases where discretization is negligible and continuous time model can be used, but there are also many cases where it’s better to use discrete time model. One of the stranger cases I have encountered, was when the observations were recorded with accuracy of one day, but the data collection process was such that it was sensible to use discretization to three month accuracy. I think it is good idea to add this kind of model to the manual, too, but also remind that although time is continuous it may have been discretized in the data collection process.

Aki


#8

Thanks very much - I won’t be emailing immediately. I am prepping a presentation on change point detection for a group of folks.

Good point about data collection effectively making time discrete. In my case I was looking at trade data which, as far as I aware, only available as monthly. The change point being inferred to be half-way through a month is not a big deal and I’d probably round it up or down depending.


#9

If you only record things at discrete intervals, then you can take the real value to be a latent value falling in that interval. For example, if you round a continuous quantity to integers, it might be +/- 0.5 from the recorded values. You can add those latent times in such cases—there’s an example in the manual in the missing data chapter.


#10

One question in my mind is whether the linear nature of the interp() function will change how the change point works. I.e., the model parameters are a linear combination over the probabilities of the change point, but the discretized version of the model uses a simplex of probabilities for each discrete time point. The softmax function is highly non-linear, so it might allow for more discontinuous change breaks, whereas this version might allow for smoother change breaks as the weights evolve.

Obviously something that could be tested with simulations.