Hi,

First time poster, so apologies in advance if this kind of question is unsuitable for this forum, or I’m ignoring local etiquette.

I am doing predictive sports modelling in Stan. If you think of a sequence of **N** team results (scores) as data, and the team’s strength **y** as a parameter varying in time, then Chapter 10.1 of the manual shows how to do this – have y as a vector, and condition **y[k]** on **y[k-1]** as well as on the result of the k-th game. That is, the core part of the model block would look something like this:

```
for (k in 1:N){
if (k > 1){
y[k] ~ normal(y[k-1], sigma);
}
score[k] ~ some_distro(some_transform(y[k]));
}
```

So far so good, but even in this formulation, the final estimate of **y[k]** is going to take into account the score in all **N** games played, instead of only games **1** till **k-1** (this is a crucial point, so please correct me if I’m wrong). But to analyse the predictive power of the model, I need precisely these intermediate estimates. Do these values even “exist” during a Stan run, and if so, is there a way to smuggle them out? Or perhaps I am missing a clever re-parametrization that would make **y[k]** mean what I need it to? The only way I found so far is re-running the model on incremental datasets, and that’s very expensive computationally.

Thanks for any and all pointers.

Thanks Krzysztof. Re: 1, this is a good tip and I’m going to re-implement along these lines. Re 2, can you point me to a canonical implementation of a Kalman filter in Stan that doesn’t do “look-ahead”? Just seeing how it can be done would be very useful to me, even if it’s not the best solution.

I don’t have a model with a Kalman filter in it on hand—just ask a separate question on the forum… I think @jrnold did some models like that a while back. For the AR piece here’s an implementation I do have on hand:

```
data {
int T; // Number of time points.
int K; // Number of coefficients.
int M;
int y[T]; // Measurement at each T
vector[M] kappa; // scaling.
int T_star;
}
parameters {
vector[K] beta;
vector<lower=0>[1] sigma;
vector[T] zeta;
}
transformed parameters {
vector[T] mu;
mu[1] = zeta[1];
for (t in 2:T) {
mu[t] = zeta[t]*sigma[1] + inv_logit(beta[1]) * mu[t-1];
// print("t: ", t, ", mu[t]: ", mu[t], ", zeta[t]: ", zeta[t],
// ", sigma[1]: ", sigma[1], ", beta[1]: ", beta[1],
// ", mu[t-1]: ", mu[t-1]
// );
}
}
model {
target += normal_lpdf(beta | 0.0, 1.0);
target += gamma_lpdf(sigma | 4.0, 10);
target += normal_lpdf(zeta | 0.0, 1);
for (t in 1:T) {
target += poisson_lpmf(y[t] | exp(mu[t]));
}
}
```

Right, since this is essentially the same issue, I won’t open a new thread, but will repeat *a general call for a Kalman filter / time series model example w/o look-ahead*. Thanks in advance.

`¯\_(ツ)_/¯`

, this makes it harder for people to find it in the future since the title doesn’t really describe the issue.

You’re right, I am going to collect my thoughts and post another question.

2 Likes

See: https://gist.github.com/jrnold/4700387

Then someone wrote a followup, which I found, but haven’t read:

That one from Juho Kokkala’s case study was way out of date. You want this sweet bookdown project:

2 Likes

Thanks. This looks like essential reading for me. Thanks to @jrnold too!