How to pass parameter estimates from one model to another at every MCMC iteration?

Hi everyone,

I am currently using stan to estimate a quite complex model with multiple layers of latent variables. To put it simply, ideally I would like to run two parallel chains, and the parameter estimates (of a reduced model) from one chain can be passed to the full model run by another chain as data at every mcmc iteration. But I suspect that it is possible in Stan. So I wonder instead whether, within one chain, I can specify two models with two targets to achieve this?

Any input on this will be highly appreciated! Thank you so much.

Best,
Jia

What you’re describing is having two sets of parameters, A and B, where parameters in A influence parameters in B but parameters in B do not influence parameters in A?

If so, then the answer is this is not possible in Stan. I’ll leave the “why” to some old posts (it’s kindof a long explanation):

Here’s one from the Gelman blog: http://andrewgelman.com/2016/02/20/dont-get-me-started-on-cut/
Here’s one from the Stan mailing list: https://groups.google.com/forum/#!topic/stan-users/LIPi6yKTykc , the presentation linked at the end (from drexel.edu) is interesting

It’s possible to accomplish something like this with multiple imputation by running one model, sampling parameters from the posterior, feeding that into another model, and then repeating the sampling + second model steps a bunch and mixing the posteriors back together. It can be a bit slow doing this.

You’ll have to think about what that means/if that makes sense for your model though.

That make sense at all?

1 Like

Thank you so much for the help, Ben.
Yes, you are right in describing the overall problem that I have.

For your suggested approach “with multiple imputation by running one model, sampling parameters from the posterior, feeding that into another model, and then repeating the sampling + second model steps a bunch and mixing the posteriors back together”. I am not sure whether I understand it or how to implement it. Do you mean write both model A and B in the “model” as one model? I also put the estimates from A into the transformed parameters for model B, so that it can be used directly in the “model” section.

Does this reflect the procedure you described? Thanks.

Jia

You can alternate models conveniently via rstan, though you have to do some warm-up plus one sampling iteration in each case. Whether it’s a good idea is a whole other story.

1 Like

A and B are two separate models. The values from the posterior of A would enter the B model through data. In super questionable pseudocode, the multiple imputation thing looks like:

Run model A
For i = 1:1000 {
  get sample from posterior of A, call that theta
  Run model B, feeding in theta
  Save posterior from model B in list
}
Mix all the posteriors of all the B models together

Google around multiple imputation in Stan or cut in Stan and you’ll probably see discussions that relate to your problem.

Truthfully the implementation probably isn’t going to be very fun, and I don’t think this sorta thing is advisable (just going from the Gelman blog and other times this conversation has come up). You’ll have to think about what it means for your problem if you use it.

What’s wrong with your original, complex, model? Is it not sampling efficiently?

For what it’s worth I was thinking of constructing a valid Gibbs sampler but I guess that’s not even the goal here.

Hi all,

Wouldn’t it be possible to merge the two “models” sharing a parameter within one, something along the lines of (also questionable pseudocode):

target += normal_lpdf(y | alpha + beta * x, sigma);
target += normal_lpdf(y_prime | alpha_prime + beta * x_prime, sigma_prime);

I think there might be an issue with getting the loglik for loo but otherwise it should be fine, or am I mistaken?

Yeah, this is the way to go. Just write out the joint density, whatever that looks like.

Ah but the issue is that in this case beta is influenced by both datasets, and that’s not what Jia wanted.

Thanks so much, Ben, @sakrejda @jriou.

I figured out a solution similar to what you suggested. I write the joint likelihood of the model A and B, and also create different datasets for each model (even though in fact they share the same response variables). In this way, I have model A which is independent B, and can pass the fitted value from model A to model B in the model section.

Alternatively, I think I can run two stan models, A and B, in sequence. I pass the posteriors from A as data for model B. But based on my testing now, the above (simpler) version seems to work well.

It’s not possible form within the Stan language itself. You can do various things like this from the outside as others have suggested, but Stan’s not ideally set up for it in all the interfaces yet (you need to do things like initialize mass matices, etc., each time).

What you’re talking about is the “cut” operation in BUGS, which isn’t Bayesian. We don’t support that in Stan. The cases where it’s kosher are things like the use of generated quantities in Stan (which don’t feed back into the model).

This comes up with multiple imputation, whre you run model A to get many draws from the posterior. Then for each such draw from the posterior, you run model B, then you put everything back togetehr to get a joint posterior. This also isn’t Bayesian, despite gluing two Bayesian steps together.

I’m curious about multiple imputation being described as non-Bayesian, since it’s described as a “gold standard” in Plummer’s “Cuts in Bayesian graphical models”, especially when contrasted with BUGS’s cut(). Are you suggesting that it’s a bad approach?

I was actually there for Martyn’s presentation at MCMCski, but that link is paywalled.

I’m suggesting it’s suboptimal and only done for computational expediency.

The “gold standard” would be building a joint model p(y, y_miss, theta) and doing full joint Bayesian inference for y_miss and theta (where y is observed data, y_miss missing data, and theta parameters).

Multiple imputation only approximates that in most cases. The reason it’s not fully Bayesian is that there’s restricted information flow as described by Martyn Plummer (and in the BUGS book).

For a while, Andrew was working on a fully Bayesian alternative, but I don’t know what ever became of that work.