# Bsts and forecasting

I am using models as defined in bsts with Stan. There is a useful thread at

I am interested in the forecasting side, using generated quantities. I have a semi local linear trend model. Based on the example above, I currently have:

``````data {
int <lower=0> N;
int <lower=0> N_forecast;
int <lower=0> K;
matrix[N,K] x;
matrix[N_forecast,K] x_forecast;
vector[N] y;
real beta_alpha;
real alpha_var;
real beta_beta;
real gamma_s_level_u;
real gamma_s_level_d;
real gamma_s_obs;
}

parameters {
vector[N] u_err;
vector[N] d_err;
vector[K] beta;
real alpha;
real <lower=0> s_obs;
real <lower=0, upper=1.0> rho;
real <lower=0> s_level_u;
real <lower=0> s_level_d;
}

transformed parameters {
vector[N] u;
vector[N] d;
d = d_err;
u = y + u_err;
for (t in 2:N) {
u[t] = u[t-1] + d[t-1] + u_err[t];
d[t] = rho*d[t-1]  + d_err[t];
}
}

model {
s_level_u ~ gamma(1, gamma_s_level_u);
s_level_d ~ gamma(1, gamma_s_level_d);
s_obs ~ gamma(1, gamma_s_obs);
rho ~ beta(beta_alpha, beta_beta);
u_err ~ normal(0,s_level_u);
d_err ~ normal(0,s_level_d);
alpha ~ cauchy(0,alpha_var);

for(i in 1:K) {
beta[i] ~ cauchy(0,2.5);
}
y ~ normal (u + alpha + x*beta,s_obs);
}

generated quantities {
vector[N] y_test;
vector[N_forecast] y_forecast;
vector[N] log_lik;
vector[N_forecast] u_forecast;
vector[N_forecast] d_forecast;
u_forecast = normal_rng(u[N] + d[N], s_level_u);
d_forecast = normal_rng(rho*d[N], s_level_d);

for (t in 2:N_forecast) {
u_forecast[t] = normal_rng(u_forecast[t-1] + d_forecast[t-1], s_level_u);
d_forecast[t] = normal_rng(rho*d_forecast[t-1], s_level_d);
}

for (t in 1:N_forecast) {
y_forecast[t] = normal_rng(u_forecast[t] + alpha + x_forecast[t,]*beta, s_obs);
}

``````

Am I on the right lines for forecasting? I’m not sure how stan does the sampling of models where there is a vector, particularly the sampling of u_err and d_err.

1 Like

On checking, it does seem that u_err and d_err are simply normal distributions, so I can sample in the proposed way. I may have been misled earlier because I was using a small number of samples and u_err and d_err did not seem to be normal.

The other thing which surprises me is how much better the fit is to training data compared with using a simple lagged variable - using a structural time series looks too good to be true - but the forecasting does have reasonable uncertainty.

Correction - on checking again, d_err is certainly not normal. The mean over time for each sample is negative.

mean(apply(d_err, 1, mean)) = -0.00789604
sd(apply(d_err, 1, mean)) = 0.002974634
mean(apply(d_err, 2, mean)) = -0.00789604
sd(apply(d_err, 2, mean)) = 0.2402744
mean(ext_fit\$s_level_d) = 0.264176

so my impression is that for each sample there is a normal distribution over time with zero mean, but each sample is sampled from a distribution and accepted/rejected according to the likelihood, or something like that. Why is

sd(apply(d_err, 1, mean))

so low?

If this low value is correct, then we need to account for it in the forecasting?

Generating a random data set with mean zero, sd 0.264, I get the following:

mean(apply(d_err, 1, mean)) = 0.00027
sd(apply(d_err, 1, mean)) = 0.0083
mean(apply(d_err, 2, mean)) = 0.00027
sd(apply(d_err, 2, mean)) = 0.0118
mean(apply(d_err, 1, sd)) = 0.264
mean(apply(d_err, 2, sd)) = 0.264

which has very different characteristics.

On further investigation, looking at the diagnostics file, it turn out that each time step is treated as a separate parameter, so if you have 2000 time steps there are 2000 independent u_err parameters and 2000 independent d_err parameters. I am no longer surprised I get such a good fit to historical data. This topic needs a lot more further investigation.