Bayesian structural time series modeling

Hello,

I am trying to implement local linear trend with regression components in stan (through rstan). This is documented under “Nowcasting” section of “BSTS blog” here - BTW, there is no seasonal component in my case - just trend and regression:

I already use BSTS package, and thought it will be good to be able to implement it in Stan. So, I wrote the basic model along with data simulation, both of which are attached here. It looks like slope and level variance are not well-identified in the model. So, will appreciate pointers on how to implement that equivalent model from BSTS package. I looked around this forum for non-centered, centered time series etc. but didn’t find anything helpful yet on how to go about fixing the problem.

Model:

data {
  int <lower=0> T;
  vector[T] x;
  vector[T] y;
}

parameters {
  vector[T] u; //Level
  vector[T] v; //Slope
  real beta;
  real <lower=0> s_obs;
  real <lower=0> s_slope;
  real <lower=0> s_level;
}

model {

  v[1] ~ normal(0, s_slope);
  tail(v,T-1) ~ normal(head(v,T-1),s_slope);

  u[1] ~ normal(0, s_level*10); //Add a large scale since time series start can be from anywhere in the stream
  tail(u,T-1) ~ normal(head(u,T-1) + head(v,T-1),s_level);

  y ~ normal (u + beta*x,s_obs);
}

Data Generation and Testing (rstan) - looks like s_slope and s_level are poorly identified, and need some kind of centering to avoid the funnel. I have included the pairs plot below.

library('rstan')
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())

#Generate dummy data for model testing
# model: y[t] = u[t] + b*x[t] + s_obs , s_* is sigma of noise
#        u[t+1] = u[t] + v[t] + s_level
#        v[t+1] = v[1] + s_slope
beta <- rnorm(250,1,0.1)
v <- rep(-0.0001,250)
u <- rep(0.0,250)
s_slope <- rnorm(250,0,0.005)
s_level <- rnorm(250,0,0.005)
for (i in 2:250){
  u[i] <- u[i-1] + v[i-1] + s_level[i-1]
  v[i] <- v[i-1] + s_slope[i-1]
}
s_obs <- rnorm(250,0,1)
x <- runif(250,0,1)
y <- u + beta*x + s_obs
n <- 250
ytrain <- y[1:n]
xtrain <- x[1:n]

#Load stan model file, and fit to data
fit <- stan(file = 'test.stan', data = list( T = length(ytrain), y = ytrain, x = xtrain) , iter=500, chains=4, control = list(max_treedepth = 10))
# Do pairs plot of different parameters
pairs(fit,pars = c("s_slope","s_level","s_obs","beta"))

rhat for s_slope and s_level are around 1.09 and 1.62 in one simulation. Also, attached the pairs plot.
test.R (909 Bytes)
test.stan (502 Bytes)
pairsplot

For latent time series, you want to declare the innovations as the parameters and construct the variables of interest in transformed parameters according to the assumed generative process. So, it would be something like this

parameters {
  vector[T] u_err;
  vector[T] v_err;
  real<lower=0> s_slope;
  real<lower=0> s_level;
  ...
}
transformed parameters {
  vector[T] u;
  vector[T] v;
  u[1] = u_err[1];
  v[1] = v_err[1];
  for (t in 2:T) {
    u[t] = u[t - 1] + s_level * u_err[t];
    v[t] = v[t - 1] + s_slope * v_err[t];
  }
}
model {
  u_err ~ normal(0,1);
  v_err ~ normal(0,1);
  ...
}

Thanks for pointers. I just updated the model. Now, it looks much better. Updated model below:

data {
  int <lower=0> T;
  vector[T] x;
  vector[T] y;
}

parameters {
  vector[T] u_err; //Slope innovation
  vector[T] v_err; //Level innovation
  real beta;
  real <lower=0> s_obs;
  real <lower=0> s_slope;
  real <lower=0> s_level;
}

transformed parameters {
  vector[T] u; //Level
  vector[T] v; //Slope
  u[1] = u_err[1];
  v[1] = v_err[1];
  for (t in 2:T) {
    u[t] = u[t-1] + v[t-1] + s_level * u_err[t];
    v[t] = v[t-1] + s_slope * v_err[t];
  }
}

model {
  u_err ~ normal(0,1);
  v_err ~ normal(0,1);
  y ~ normal (u + beta*x,s_obs);
}

rhat is close to 1 now for all the parameters. In occassional simulations, it can be close to 1.01 still. I will go work on the priors for now to see if this improves the model by passing it more information.

One question - is there a writeup or reference material somewhere that will provide more insight into why re-parameterizing it this way helps improve s_slope and s_level estimation (deviation of slope and level).

For s_level, we do have:
s_level = (u[t] - (u[t-1]+v[t-1]))/u_err[t]

That is where I am stuck at currently.

I don’t know that we have a case study on this reparameterization of an AR(1) process, but is all the same fundamental idea: Try to make things as independent as possible. If you do a random walk in levels, there is a lot of dependence between the current time point and the previous one. But if you parameterize in terms of innovations, then they are plausibly independent.

1 Like