Simpler Change Point Models?

Using the example in the Stan manual of marginalising out the discrete time parameter, I created a change point model for UK / South Korean trade (details here if you are interested). @rabaath suggested modelling time as continuous. Doing so results in a simpler model and to me makes more sense (to me at any rate) as time really is continuous.

I attach the Stan below. I am happy to create a PR for the Stan manual but I don’t wish to put effort into doing this if it’s not a good idea.

data {
  int<lower=1> N;
  vector[N] x;
  vector[N] y;
}

parameters {
  real<lower=0, upper=N+1> tau;
  real mu1;
  real mu2;
  real gamma1;
  real gamma2;

  real<lower=0> sigma1;
  real<lower=0> sigma2;
}

model {
  real mu;
  real gamma;
  real sigma;
  real weight;

  mu1 ~ normal(0, 10);
  mu2 ~ normal(0, 10);
  gamma1 ~ normal(0, 10);
  gamma2 ~ normal(0, 10);
  sigma1 ~ normal(0, 10);
  sigma2 ~ normal(0, 10);

  for (i in 1:N) {
    weight = inv_logit(2 * (i - tau));

    mu = weight * mu2 + (1 - weight) * mu1;
    gamma = weight * gamma2 + (1 - weight) * gamma1;
    sigma = weight * sigma2 + (1 - weight) * sigma1;

    y[i] ~ normal(mu * x[i] + gamma, sigma);
  }
}
3 Likes

Please don’t remove the change-point model as coded in the manual. It’s meant only as a simple example of

  • how to carry out the marginalization, and
  • how to recover discrete quantities of interest

Feel free to make a pull request to the manual that adds this a parenthetical alternative. You should run it by Andrew first to get sign off on the model—he has strong opinions on how to code such models and every time anyone brings up change-point models he suggests making them continuous.

You can recode the model more efficiently and in much less code as

functions {
  vector interp(vector w, vector wc, vector a) {
    return w * a[1] + wc * a[2];
  }
}
data {
  int<lower=1> N;
  vector[N] x;
  vector[N] y;
}
transformed data {
  vector[N] steps;
  for (n in 1:N) steps[n] = 2 * n;
}
parameters {
  real<lower=0, upper=N+1> tau;
  vector[2] mu;
  vector[2] gamma;
  vector<lower=0>[2] sigma;
}
model {
  vector[N] w = inv_logit(steps - 2 * tau);
  vector[N] wc = 1 - w;

  y ~ normal(x .* interp(w, wc, mu) + interp(w, wc, gamma),
             interp(w, wc, sigma));

  mu ~ normal(0, 10);
  gamma ~ normal(0, 10);
  sigma ~ normal(0, 10);
}
2 Likes

With this I get a missing “;” error but afaics all semi-colons are present and correct.

> fitMC <- stan(
+     file = "lr-changepoint-cont.stan",
+     data = list(x = x, y = y, N = length(y)),
+     chains = 4,
+     warmup = 1000,
+     iter = 10000,
+     cores = 4,
+     refresh = 500,
+     seed=42
+ )
SYNTAX ERROR, MESSAGE(S) FROM PARSER:


ERROR at line 69

 68:    model {
 69:      vector[N] w = inv_logit(steps - 2 * tau);
                      ^
 70:      vector[N] wc = 1 - w;

PARSER EXPECTED: ";"
Error in stanc(file = file, model_code = model_code, model_name = model_name,  : 
  failed to parse Stan model 'lr-changepoint-cont' due to the above error.
> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: OS X El Capitan 10.11.5

locale:
[1] en_GB/en_GB/en_GB/C/en_GB/en_GB

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] coda_0.18-1          zoo_1.7-13           rstan_2.15.1        
[4] StanHeaders_2.15.0-1 ggplot2_2.2.1       

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.8      codetools_0.2-15 lattice_0.20-34  assertthat_0.1  
 [5] grid_3.3.2       plyr_1.8.4       gtable_0.2.0     stats4_3.3.2    
 [9] scales_0.4.1     lazyeval_0.2.0   labeling_0.3     tools_3.3.2     
[13] munsell_0.4.3    compiler_3.3.2   parallel_3.3.2   inline_0.3.14   
[17] colorspace_1.3-1 gridExtra_2.2.1  tibble_1.3.3    

I restarted R and the problem has gone away. It will forever remain one of life’s great unsolved mysteries.

1 Like

Please don’t remove the change-point model as coded in the manual. It’s meant only as a simple example of how to carry out the marginalization, and how to recover discrete quantities of interest

I wasn’t proposing to remove anything only to point out it seemed simpler (to me at any rate) to treat time as continuous.

Feel free to make a pull request to the manual that adds this a parenthetical alternative. You should run it by Andrew first to get sign off on the model—he has strong opinions on how to code such models and every time anyone brings up change-point models he suggests making them continuous.

What is the best way of running it past Andrew? I imagine he gets hundreds of emails a day.

Email. As far as I can tell, Andrew doesn’t monitor Discourse topics or GitHub issues or pull requests. If you don’t get a response, let me know, and I can bug him in person.

Although time is continuous, when the data is collected the observation time (or time interval) is discretized. There are cases where discretization is negligible and continuous time model can be used, but there are also many cases where it’s better to use discrete time model. One of the stranger cases I have encountered, was when the observations were recorded with accuracy of one day, but the data collection process was such that it was sensible to use discretization to three month accuracy. I think it is good idea to add this kind of model to the manual, too, but also remind that although time is continuous it may have been discretized in the data collection process.

Aki

Thanks very much - I won’t be emailing immediately. I am prepping a presentation on change point detection for a group of folks.

Good point about data collection effectively making time discrete. In my case I was looking at trade data which, as far as I aware, only available as monthly. The change point being inferred to be half-way through a month is not a big deal and I’d probably round it up or down depending.

If you only record things at discrete intervals, then you can take the real value to be a latent value falling in that interval. For example, if you round a continuous quantity to integers, it might be +/- 0.5 from the recorded values. You can add those latent times in such cases—there’s an example in the manual in the missing data chapter.

One question in my mind is whether the linear nature of the interp() function will change how the change point works. I.e., the model parameters are a linear combination over the probabilities of the change point, but the discretized version of the model uses a simplex of probabilities for each discrete time point. The softmax function is highly non-linear, so it might allow for more discontinuous change breaks, whereas this version might allow for smoother change breaks as the weights evolve.

Obviously something that could be tested with simulations.

This is a chance observation due to some current work but I wondered if the 2 in the following statement isn’t an arbitrary constant?

weight = inv_logit(2 * (i - tau));

So far as I can see it governs the speed of the switch-over and should (for example) 10 be used instead the answer may be different.

Would it be a good idea to use a larger value here (e.g. 10,20) to reduce the smoothing effect whilst keeping the model continuous?

Yes it’s arbitrary and yes a larger value would make the change point more pronounced.

I would like to do a comparison between a model with, and one without a change point. Could someone please help me determine the most efficient way to compute and return the log_lik in a model with a change point like this?

Do I have to re-calculate w and wc in the generated_parameters block? In this case, I would really rather not collect the w and wc variables from the generated_parameters block, as that makes my stanfit object in R much larger than it otherwise would be. Basically, I think what I’m wondering is whether there’s a way to calculate the log_lik without also having to return w and wc?

These are often fit as positive-constrained parameters in IRT models and called “discrimination” because the higher the value, the more it amplifies ability differences.

The likelihood you want is the data likelihood without the change point. So the change point has to be marginalized out as it is in the example in the user’s guide.

2 Likes