Generated quantities and transformed parameters

I am confused about how I extend the following model to make future predictions? This is a structural time series model.

data {
      int <lower=0> T;
      vector[T] x;
      vector[T] y;
    }

    parameters {
      vector[T] u_err; //Slope innovation
      vector[T] v_err; //Level innovation 
      real beta;
      real <lower=0> s_obs;
      real <lower=0> s_slope;
      real <lower=0> s_level;
    }

    transformed parameters {
      vector[T] u; //Level
      vector[T] v; //Slope
      u[1] = u_err[1];
      v[1] = v_err[1];
      for (t in 2:T) {
        u[t] = u[t-1] + v[t-1] + s_level * u_err[t];
        v[t] = v[t-1] + s_slope * v_err[t];
      }
    }

    model {
      u_err ~ normal(0,1);
      v_err ~ normal(0,1);
      y ~ normal (u + beta*x, s_obs);
    }

For reference, here is my attempt:

data {
  int <lower=0> T;
  vector[T] x;
  vector[T] y;
  int <lower=0> n_pred;
}

parameters {
  vector[T] u_err; //Slope innovation
  vector[T] v_err; //Level innovation
  real beta;
  real <lower=0> s_obs;
  real <lower=0> s_slope;
  real <lower=0> s_level;
}

transformed parameters {
  vector[T] u; //Level
  vector[T] v; //Slope
  u[1] = u_err[1];
  v[1] = v_err[1];
  for (t in 2:T) {
    u[t] = u[t-1] + v[t-1] + s_level * u_err[t];
    v[t] = v[t-1] + s_slope * v_err[t];
  }
}

model {
  u_err ~ normal(0,1);
  v_err ~ normal(0,1);
  y ~ normal (u + beta*x, s_obs);
}

generated quantities {
    vector[T+n_pred] u_pred;
    vector[T+n_pred] u_pred_err;
    vector[T+n_pred] v_pred;
    vector[T+n_pred] v_pred_err;
    vector[T+n_pred] y_pred;

    u_pred[1:T] = u;
    v_pred[1:T] = v;
    u_pred_err[1:T] = u_err;
    v_pred_err[1:T] = v_err;
    y_pred[1:T] = y;

    for (t in (T+1):(T+n_pred)) {
        u_pred_err[t] = normal_rng(0, 1);
        v_pred_err[t] = normal_rng(0, 1);
        u_pred[t] = u_pred[t-1] + v_pred[t-1] + s_level * u_err[t];
        v_pred[t] = v_pred[t-1] + s_slope * v_err[t];
        y_pred[t] = normal_rng(u_pred[t] + beta*x, s_obs);
    }
}
```[unemployment.csv|attachment](upload://8jXl9JVR8wxWtuWVbCW3Kyz1kuV.csv) (32.1 KB) 

A full working example in python is included below:

```python
import pystan
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
plt.style.use('ggplot')

data = pd.read_csv('unemployment.csv')

stan_code = """data {
  int <lower=0> T;
  vector[T] x;
  vector[T] y;
}

parameters {
  vector[T] u_err; //Slope innovation
  vector[T] v_err; //Level innovation 
  real beta;
  real <lower=0> s_obs;
  real <lower=0> s_slope;
  real <lower=0> s_level;
}

transformed parameters {
  vector[T] u; //Level
  vector[T] v; //Slope
  u[1] = u_err[1];
  v[1] = v_err[1];
  for (t in 2:T) {
    u[t] = u[t-1] + v[t-1] + s_level * u_err[t];
    v[t] = v[t-1] + s_slope * v_err[t];
  }
}

model {
  u_err ~ normal(0,1);
  v_err ~ normal(0,1);
  y ~ normal (u + beta*x, s_obs);
}"""

data_feed = {'y': data['unemployment.office'].values, 'x': np.zeros((data.shape[0], )), 'T': data.shape[0]}
sm = pystan.StanModel(model_code=stan_code)
fit = sm.sampling(data=data_feed, iter=1000)

Hi, I think these should use u_pred_err and v_pred_err

@ahartikainen’s correction is right.

You don’t need to save the predictive innovations – they’ll just be normally distributed – and I’m not sure why you’d want to save everything from time 1 to time T again but defining the vectors in generated quantities to be of size T + n_pred instead of just n_pred.

Conceptually the biggest limitations of prediction like this are (a) they become very uncertain very quickly and (b) the assumption that all of the covariates are stationary is often questionable. Modulo those concerns the basic structure of the model is correct.