Using the example in the Stan manual of marginalising out the discrete time parameter, I created a change point model for UK / South Korean trade (details here if you are interested). @rabaath suggested modelling time as continuous. Doing so results in a simpler model and to me makes more sense (to me at any rate) as time really is continuous.
I attach the Stan below. I am happy to create a PR for the Stan manual but I don’t wish to put effort into doing this if it’s not a good idea.
data {
int<lower=1> N;
vector[N] x;
vector[N] y;
}
parameters {
real<lower=0, upper=N+1> tau;
real mu1;
real mu2;
real gamma1;
real gamma2;
real<lower=0> sigma1;
real<lower=0> sigma2;
}
model {
real mu;
real gamma;
real sigma;
real weight;
mu1 ~ normal(0, 10);
mu2 ~ normal(0, 10);
gamma1 ~ normal(0, 10);
gamma2 ~ normal(0, 10);
sigma1 ~ normal(0, 10);
sigma2 ~ normal(0, 10);
for (i in 1:N) {
weight = inv_logit(2 * (i - tau));
mu = weight * mu2 + (1 - weight) * mu1;
gamma = weight * gamma2 + (1 - weight) * gamma1;
sigma = weight * sigma2 + (1 - weight) * sigma1;
y[i] ~ normal(mu * x[i] + gamma, sigma);
}
}
Please don’t remove the change-point model as coded in the manual. It’s meant only as a simple example of
how to carry out the marginalization, and
how to recover discrete quantities of interest
Feel free to make a pull request to the manual that adds this a parenthetical alternative. You should run it by Andrew first to get sign off on the model—he has strong opinions on how to code such models and every time anyone brings up change-point models he suggests making them continuous.
You can recode the model more efficiently and in much less code as
functions {
vector interp(vector w, vector wc, vector a) {
return w * a[1] + wc * a[2];
}
}
data {
int<lower=1> N;
vector[N] x;
vector[N] y;
}
transformed data {
vector[N] steps;
for (n in 1:N) steps[n] = 2 * n;
}
parameters {
real<lower=0, upper=N+1> tau;
vector[2] mu;
vector[2] gamma;
vector<lower=0>[2] sigma;
}
model {
vector[N] w = inv_logit(steps - 2 * tau);
vector[N] wc = 1 - w;
y ~ normal(x .* interp(w, wc, mu) + interp(w, wc, gamma),
interp(w, wc, sigma));
mu ~ normal(0, 10);
gamma ~ normal(0, 10);
sigma ~ normal(0, 10);
}
Please don’t remove the change-point model as coded in the manual. It’s meant only as a simple example of how to carry out the marginalization, and how to recover discrete quantities of interest
I wasn’t proposing to remove anything only to point out it seemed simpler (to me at any rate) to treat time as continuous.
Feel free to make a pull request to the manual that adds this a parenthetical alternative. You should run it by Andrew first to get sign off on the model—he has strong opinions on how to code such models and every time anyone brings up change-point models he suggests making them continuous.
What is the best way of running it past Andrew? I imagine he gets hundreds of emails a day.
Email. As far as I can tell, Andrew doesn’t monitor Discourse topics or GitHub issues or pull requests. If you don’t get a response, let me know, and I can bug him in person.
Although time is continuous, when the data is collected the observation time (or time interval) is discretized. There are cases where discretization is negligible and continuous time model can be used, but there are also many cases where it’s better to use discrete time model. One of the stranger cases I have encountered, was when the observations were recorded with accuracy of one day, but the data collection process was such that it was sensible to use discretization to three month accuracy. I think it is good idea to add this kind of model to the manual, too, but also remind that although time is continuous it may have been discretized in the data collection process.
Thanks very much - I won’t be emailing immediately. I am prepping a presentation on change point detection for a group of folks.
Good point about data collection effectively making time discrete. In my case I was looking at trade data which, as far as I aware, only available as monthly. The change point being inferred to be half-way through a month is not a big deal and I’d probably round it up or down depending.
If you only record things at discrete intervals, then you can take the real value to be a latent value falling in that interval. For example, if you round a continuous quantity to integers, it might be +/- 0.5 from the recorded values. You can add those latent times in such cases—there’s an example in the manual in the missing data chapter.
One question in my mind is whether the linear nature of the interp() function will change how the change point works. I.e., the model parameters are a linear combination over the probabilities of the change point, but the discretized version of the model uses a simplex of probabilities for each discrete time point. The softmax function is highly non-linear, so it might allow for more discontinuous change breaks, whereas this version might allow for smoother change breaks as the weights evolve.
Obviously something that could be tested with simulations.
I would like to do a comparison between a model with, and one without a change point. Could someone please help me determine the most efficient way to compute and return the log_lik in a model with a change point like this?
Do I have to re-calculate w and wc in the generated_parameters block? In this case, I would really rather not collect the w and wc variables from the generated_parameters block, as that makes my stanfit object in R much larger than it otherwise would be. Basically, I think what I’m wondering is whether there’s a way to calculate the log_lik without also having to return w and wc?
These are often fit as positive-constrained parameters in IRT models and called “discrimination” because the higher the value, the more it amplifies ability differences.
The likelihood you want is the data likelihood without the change point. So the change point has to be marginalized out as it is in the example in the user’s guide.