Bsts and forecasting

I am using models as defined in bsts with Stan. There is a useful thread at

I am interested in the forecasting side, using generated quantities. I have a semi local linear trend model. Based on the example above, I currently have:

data {
  int <lower=0> N;
  int <lower=0> N_forecast;
  int <lower=0> K;
  matrix[N,K] x;
  matrix[N_forecast,K] x_forecast;
  vector[N] y;
  real beta_alpha;
  real alpha_var;
  real beta_beta;
  real gamma_s_level_u;
  real gamma_s_level_d;
  real gamma_s_obs;

parameters {
  vector[N] u_err; 
  vector[N] d_err; 
  vector[K] beta;
  real alpha;
  real <lower=0> s_obs;
  real <lower=0, upper=1.0> rho;
  real <lower=0> s_level_u;
  real <lower=0> s_level_d;

transformed parameters {
  vector[N] u;
  vector[N] d;
     d[1] = d_err[1];
     u[1] = y[1] + u_err[1];
  for (t in 2:N) {
     u[t] = u[t-1] + d[t-1] + u_err[t];
     d[t] = rho*d[t-1]  + d_err[t];

model {
  s_level_u ~ gamma(1, gamma_s_level_u);
  s_level_d ~ gamma(1, gamma_s_level_d);
  s_obs ~ gamma(1, gamma_s_obs);
  rho ~ beta(beta_alpha, beta_beta);
  u_err ~ normal(0,s_level_u);
  d_err ~ normal(0,s_level_d);
  alpha ~ cauchy(0,alpha_var); 
  for(i in 1:K) {
     beta[i] ~ cauchy(0,2.5);
  y ~ normal (u + alpha + x*beta,s_obs);

generated quantities {
  vector[N] y_test;
  vector[N_forecast] y_forecast;
  vector[N] log_lik;
  vector[N_forecast] u_forecast;
  vector[N_forecast] d_forecast;
     u_forecast[1] = normal_rng(u[N] + d[N], s_level_u);
     d_forecast[1] = normal_rng(rho*d[N], s_level_d);
  for (t in 2:N_forecast) {
       u_forecast[t] = normal_rng(u_forecast[t-1] + d_forecast[t-1], s_level_u);
       d_forecast[t] = normal_rng(rho*d_forecast[t-1], s_level_d);

  for (t in 1:N_forecast) {
     y_forecast[t] = normal_rng(u_forecast[t] + alpha + x_forecast[t,]*beta, s_obs);

Am I on the right lines for forecasting? I’m not sure how stan does the sampling of models where there is a vector, particularly the sampling of u_err and d_err.

1 Like

On checking, it does seem that u_err and d_err are simply normal distributions, so I can sample in the proposed way. I may have been misled earlier because I was using a small number of samples and u_err and d_err did not seem to be normal.

The other thing which surprises me is how much better the fit is to training data compared with using a simple lagged variable - using a structural time series looks too good to be true - but the forecasting does have reasonable uncertainty.

Correction - on checking again, d_err is certainly not normal. The mean over time for each sample is negative.

mean(apply(d_err, 1, mean)) = -0.00789604
sd(apply(d_err, 1, mean)) = 0.002974634
mean(apply(d_err, 2, mean)) = -0.00789604
sd(apply(d_err, 2, mean)) = 0.2402744
mean(ext_fit$s_level_d) = 0.264176

so my impression is that for each sample there is a normal distribution over time with zero mean, but each sample is sampled from a distribution and accepted/rejected according to the likelihood, or something like that. Why is

sd(apply(d_err, 1, mean))

so low?

If this low value is correct, then we need to account for it in the forecasting?

Generating a random data set with mean zero, sd 0.264, I get the following:

mean(apply(d_err, 1, mean)) = 0.00027
sd(apply(d_err, 1, mean)) = 0.0083
mean(apply(d_err, 2, mean)) = 0.00027
sd(apply(d_err, 2, mean)) = 0.0118
mean(apply(d_err, 1, sd)) = 0.264
mean(apply(d_err, 2, sd)) = 0.264

which has very different characteristics.

On further investigation, looking at the diagnostics file, it turn out that each time step is treated as a separate parameter, so if you have 2000 time steps there are 2000 independent u_err parameters and 2000 independent d_err parameters. I am no longer surprised I get such a good fit to historical data. This topic needs a lot more further investigation.