Hi,
First time poster, so apologies in advance if this kind of question is unsuitable for this forum, or I’m ignoring local etiquette.
I am doing predictive sports modelling in Stan. If you think of a sequence of N team results (scores) as data, and the team’s strength y as a parameter varying in time, then Chapter 10.1 of the manual shows how to do this – have y as a vector, and condition y[k] on y[k-1] as well as on the result of the k-th game. That is, the core part of the model block would look something like this:
for (k in 1:N){
if (k > 1){
y[k] ~ normal(y[k-1], sigma);
}
score[k] ~ some_distro(some_transform(y[k]));
}
So far so good, but even in this formulation, the final estimate of y[k] is going to take into account the score in all N games played, instead of only games 1 till k-1 (this is a crucial point, so please correct me if I’m wrong). But to analyse the predictive power of the model, I need precisely these intermediate estimates. Do these values even “exist” during a Stan run, and if so, is there a way to smuggle them out? Or perhaps I am missing a clever re-parametrization that would make y[k] mean what I need it to? The only way I found so far is re-running the model on incremental datasets, and that’s very expensive computationally.
Thanks for any and all pointers.
Thanks Krzysztof. Re: 1, this is a good tip and I’m going to re-implement along these lines. Re 2, can you point me to a canonical implementation of a Kalman filter in Stan that doesn’t do “look-ahead”? Just seeing how it can be done would be very useful to me, even if it’s not the best solution.
I don’t have a model with a Kalman filter in it on hand—just ask a separate question on the forum… I think @jrnold did some models like that a while back. For the AR piece here’s an implementation I do have on hand:
data {
int T; // Number of time points.
int K; // Number of coefficients.
int M;
int y[T]; // Measurement at each T
vector[M] kappa; // scaling.
int T_star;
}
parameters {
vector[K] beta;
vector<lower=0>[1] sigma;
vector[T] zeta;
}
transformed parameters {
vector[T] mu;
mu[1] = zeta[1];
for (t in 2:T) {
mu[t] = zeta[t]*sigma[1] + inv_logit(beta[1]) * mu[t-1];
// print("t: ", t, ", mu[t]: ", mu[t], ", zeta[t]: ", zeta[t],
// ", sigma[1]: ", sigma[1], ", beta[1]: ", beta[1],
// ", mu[t-1]: ", mu[t-1]
// );
}
}
model {
target += normal_lpdf(beta | 0.0, 1.0);
target += gamma_lpdf(sigma | 4.0, 10);
target += normal_lpdf(zeta | 0.0, 1);
for (t in 1:T) {
target += poisson_lpmf(y[t] | exp(mu[t]));
}
}
Right, since this is essentially the same issue, I won’t open a new thread, but will repeat a general call for a Kalman filter / time series model example w/o look-ahead. Thanks in advance.
¯\_(ツ)_/¯
, this makes it harder for people to find it in the future since the title doesn’t really describe the issue.
You’re right, I am going to collect my thoughts and post another question.
2 Likes
See: Kalman filter in Stan · GitHub
Then someone wrote a followup, which I found, but haven’t read:
That one from Juho Kokkala’s case study was way out of date. You want this sweet bookdown project:
2 Likes
Thanks. This looks like essential reading for me. Thanks to @jrnold too!