Generated quantities and transformed parameters

I am confused about how I extend the following model to make future predictions? This is a structural time series model.

``````data {
int <lower=0> T;
vector[T] x;
vector[T] y;
}

parameters {
vector[T] u_err; //Slope innovation
vector[T] v_err; //Level innovation
real beta;
real <lower=0> s_obs;
real <lower=0> s_slope;
real <lower=0> s_level;
}

transformed parameters {
vector[T] u; //Level
vector[T] v; //Slope
u[1] = u_err[1];
v[1] = v_err[1];
for (t in 2:T) {
u[t] = u[t-1] + v[t-1] + s_level * u_err[t];
v[t] = v[t-1] + s_slope * v_err[t];
}
}

model {
u_err ~ normal(0,1);
v_err ~ normal(0,1);
y ~ normal (u + beta*x, s_obs);
}
``````

For reference, here is my attempt:

``````data {
int <lower=0> T;
vector[T] x;
vector[T] y;
int <lower=0> n_pred;
}

parameters {
vector[T] u_err; //Slope innovation
vector[T] v_err; //Level innovation
real beta;
real <lower=0> s_obs;
real <lower=0> s_slope;
real <lower=0> s_level;
}

transformed parameters {
vector[T] u; //Level
vector[T] v; //Slope
u[1] = u_err[1];
v[1] = v_err[1];
for (t in 2:T) {
u[t] = u[t-1] + v[t-1] + s_level * u_err[t];
v[t] = v[t-1] + s_slope * v_err[t];
}
}

model {
u_err ~ normal(0,1);
v_err ~ normal(0,1);
y ~ normal (u + beta*x, s_obs);
}

generated quantities {
vector[T+n_pred] u_pred;
vector[T+n_pred] u_pred_err;
vector[T+n_pred] v_pred;
vector[T+n_pred] v_pred_err;
vector[T+n_pred] y_pred;

u_pred[1:T] = u;
v_pred[1:T] = v;
u_pred_err[1:T] = u_err;
v_pred_err[1:T] = v_err;
y_pred[1:T] = y;

for (t in (T+1):(T+n_pred)) {
u_pred_err[t] = normal_rng(0, 1);
v_pred_err[t] = normal_rng(0, 1);
u_pred[t] = u_pred[t-1] + v_pred[t-1] + s_level * u_err[t];
v_pred[t] = v_pred[t-1] + s_slope * v_err[t];
y_pred[t] = normal_rng(u_pred[t] + beta*x, s_obs);
}
}

A full working example in python is included below:

```python
import pystan
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
plt.style.use('ggplot')

stan_code = """data {
int <lower=0> T;
vector[T] x;
vector[T] y;
}

parameters {
vector[T] u_err; //Slope innovation
vector[T] v_err; //Level innovation
real beta;
real <lower=0> s_obs;
real <lower=0> s_slope;
real <lower=0> s_level;
}

transformed parameters {
vector[T] u; //Level
vector[T] v; //Slope
u[1] = u_err[1];
v[1] = v_err[1];
for (t in 2:T) {
u[t] = u[t-1] + v[t-1] + s_level * u_err[t];
v[t] = v[t-1] + s_slope * v_err[t];
}
}

model {
u_err ~ normal(0,1);
v_err ~ normal(0,1);
y ~ normal (u + beta*x, s_obs);
}"""

data_feed = {'y': data['unemployment.office'].values, 'x': np.zeros((data.shape[0], )), 'T': data.shape[0]}
sm = pystan.StanModel(model_code=stan_code)
fit = sm.sampling(data=data_feed, iter=1000)
``````

Hi, I think these should use `u_pred_err` and `v_pred_err`

@ahartikainen’s correction is right.

You don’t need to save the predictive innovations – they’ll just be normally distributed – and I’m not sure why you’d want to save everything from time `1` to time `T` again but defining the vectors in `generated quantities` to be of size `T + n_pred` instead of just `n_pred`.

Conceptually the biggest limitations of prediction like this are (a) they become very uncertain very quickly and (b) the assumption that all of the covariates are stationary is often questionable. Modulo those concerns the basic structure of the model is correct.