I am using models as defined in bsts with Stan. There is a useful thread at
I am interested in the forecasting side, using generated quantities. I have a semi local linear trend model. Based on the example above, I currently have:
data {
int <lower=0> N;
int <lower=0> N_forecast;
int <lower=0> K;
matrix[N,K] x;
matrix[N_forecast,K] x_forecast;
vector[N] y;
real beta_alpha;
real alpha_var;
real beta_beta;
real gamma_s_level_u;
real gamma_s_level_d;
real gamma_s_obs;
}
parameters {
vector[N] u_err;
vector[N] d_err;
vector[K] beta;
real alpha;
real <lower=0> s_obs;
real <lower=0, upper=1.0> rho;
real <lower=0> s_level_u;
real <lower=0> s_level_d;
}
transformed parameters {
vector[N] u;
vector[N] d;
d[1] = d_err[1];
u[1] = y[1] + u_err[1];
for (t in 2:N) {
u[t] = u[t-1] + d[t-1] + u_err[t];
d[t] = rho*d[t-1] + d_err[t];
}
}
model {
s_level_u ~ gamma(1, gamma_s_level_u);
s_level_d ~ gamma(1, gamma_s_level_d);
s_obs ~ gamma(1, gamma_s_obs);
rho ~ beta(beta_alpha, beta_beta);
u_err ~ normal(0,s_level_u);
d_err ~ normal(0,s_level_d);
alpha ~ cauchy(0,alpha_var);
for(i in 1:K) {
beta[i] ~ cauchy(0,2.5);
}
y ~ normal (u + alpha + x*beta,s_obs);
}
generated quantities {
vector[N] y_test;
vector[N_forecast] y_forecast;
vector[N] log_lik;
vector[N_forecast] u_forecast;
vector[N_forecast] d_forecast;
u_forecast[1] = normal_rng(u[N] + d[N], s_level_u);
d_forecast[1] = normal_rng(rho*d[N], s_level_d);
for (t in 2:N_forecast) {
u_forecast[t] = normal_rng(u_forecast[t-1] + d_forecast[t-1], s_level_u);
d_forecast[t] = normal_rng(rho*d_forecast[t-1], s_level_d);
}
for (t in 1:N_forecast) {
y_forecast[t] = normal_rng(u_forecast[t] + alpha + x_forecast[t,]*beta, s_obs);
}
Am I on the right lines for forecasting? I’m not sure how stan does the sampling of models where there is a vector, particularly the sampling of u_err and d_err.
On checking, it does seem that u_err and d_err are simply normal distributions, so I can sample in the proposed way. I may have been misled earlier because I was using a small number of samples and u_err and d_err did not seem to be normal.
The other thing which surprises me is how much better the fit is to training data compared with using a simple lagged variable - using a structural time series looks too good to be true - but the forecasting does have reasonable uncertainty.
Correction - on checking again, d_err is certainly not normal. The mean over time for each sample is negative.
so my impression is that for each sample there is a normal distribution over time with zero mean, but each sample is sampled from a distribution and accepted/rejected according to the likelihood, or something like that. Why is
sd(apply(d_err, 1, mean))
so low?
If this low value is correct, then we need to account for it in the forecasting?
On further investigation, looking at the diagnostics file, it turn out that each time step is treated as a separate parameter, so if you have 2000 time steps there are 2000 independent u_err parameters and 2000 independent d_err parameters. I am no longer surprised I get such a good fit to historical data. This topic needs a lot more further investigation.