Using posterior samples as data for next model

dstauffer11 · May 27, 2020, 1:46pm

Hi!

I started by trying to replicate the CausalImpact bayesian state-space modeling package in Stan. I think I got it working pretty well, the model code is here (thanks mostly to another post on this board). Also attaching an image comparing the package result to my own. T is the total number of time steps, T0 is the time of intervention.

data {
  int<lower=2> T;
  int<lower=1> T0;
  int<lower=1> N;
  matrix[T, N] X; // correlated time series
  vector[T] y; // intervention time series
}

parameters {
  vector[T] u_err; //Slope innovation
  vector[T] v_err; //Level innovation
  vector[N] beta; // correlated time series coefficients
  real<lower=0> s_obs;
  real<lower=0> s_slope;
  real<lower=0> s_level;
}

transformed parameters {
  vector[T0] u; //Level
  vector[T0] v; //Slope
  u[1] = u_err[1];
  v[1] = v_err[1];
  for (t in 2:T0) {
    u[t] = u[t-1] + v[t-1] + s_level * u_err[t];
    v[t] = v[t-1] + s_slope * v_err[t];
  }
}

model {
  u_err ~ normal(0,1);
  v_err ~ normal(0,1);
  beta ~ normal(0, 1);
  s_obs ~ normal(0, 5);
  s_slope ~ normal(0, 0.1);
  s_level ~ normal(0, 0.1);

  y[1:T0] ~ normal(u + X[1:T0, :]*beta, s_obs);
}

generated quantities {
  vector[T] y_rep;       // predicted time series without intervention
  vector[T-T0] u_rep;
  vector[T-T0] v_rep;

  for (t in 1:T0) {
    y_rep[t] = normal_rng(u[t] + X[t, :]*beta, s_obs);
  }
  
  
  u_rep[1] = u[T0] + v[T0] + s_level*u_err[T0];
  v_rep[1] = v[T0] + s_slope * v_err[T0];
  y_rep[T0+1] = normal_rng(u_rep[1] + X[T0+1, :]*beta, s_obs);
  for (t in 2:T-T0) {
    u_rep[t] = u_rep[t-1] + v_rep[t-1] + s_level * u_err[t+T0];
    v_rep[t] = v_rep[t-1] + s_slope * v_err[t+T0];
    y_rep[t+T0] = normal_rng(u_rep[t] + X[t+T0, :]*beta, s_obs);
  }
}

However, what I want to do next is make a hierarchical model across multiple time series-interventions. It gets messy for two reasons:

The time series don’t all have the same lengths before/after intervention. Aligning things in Stan could get messy but I want to use as much pre-intervention data as possible to inform the coefficients beta.
I’m estimating the effect indirectly, by subtracting the posterior samples of y_rep from observed y. Maybe there’s a better way to do this in the model block but I’m not sure how, I could easily adapt the generated quantities to handle that.

So my questions are two:

If I decide to handle the messy non-aligned indexing in Stan, is there a way to measure the benefits in the model block (y[T0:T] - y_rep[T0:T]). If so, I think I know how to put a hierarchical model on top of that.
If the above doesn’t work, can I use the posterior samples as data for a second model? I couldn’t find anything about this approach on here, in theory I don’t see why this wouldn’t work. If it’s possible, someone must have done it before so any pointers on how to do it well?

Thanks for any help! I’m still fairly new to Stan so hopefully this approach and my questions make sense.!

dstauffer11 · May 27, 2020, 1:48pm

My replication of a CausalImpact result:

mike-lawrence · May 27, 2020, 2:04pm

If you mean using the posterior of the model as a prior for the next model, there’s been a lot of discussion on this here across multiple threads (see here for example), and the answer is that there is no automatic way of doing it, doing it by hand would require a ton of simplification (ex. we have limited multivariate distributions, so you’d be collapsing the full dimensional posterior into simpler marginals to which you try to find defined distributions that fit well enough and hope you don’t lose too much info in the process), so it’s generally better to try to work out how to model all the data at once rather than iteratively in this manner.

If you mean use the samples from the posterior as literal data input to a second model, that really does not make any inferential sense as you would be able to generate an arbitrarily narrow/confident posterior by simply changing the number of samples provided as input.

dstauffer11 · May 27, 2020, 2:10pm

I meant the second, and when you put it that way I see why using posterior samples as data wouldn’t directly work. So I guess I’m reduced to the first question, thank you!

mike-lawrence · May 27, 2020, 2:13pm

You might consider re-posting your original post with the second question stripped off. As a volunteer, I tend to look for posts with no replies, and I suspect others do too, so when a multi-question post like this gets one question answered, the second unanswered question might not get addressed. We really should make it explicit to posters that it’s best to ask one question per thread.

dstauffer11 · May 27, 2020, 2:17pm

Okay, I’ll wait a bit and if this question dries up I’ll post the reduced version. Thanks for the advice.

Topic		Replies	Views
Combining spline and state space model Modeling fitting-issues , state-space , splines	5	1305	May 29, 2020
Help on a state space modeling Modeling pystan , techniques , performance	3	852	June 24, 2020
Time-series in Stan, I am new to Stan and need hints to develop the model. THANKS Modeling rstan , specification	43	2837	June 12, 2020
Updating State Space Model after receiving new data? Modeling state-space	1	94	September 13, 2024
Sequentional Bayesian Infernece for State Space Model Modeling rstan , techniques , specification , cmdstanr	4	609	February 2, 2024

Using posterior samples as data for next model

Related topics