Posterior from STAN model as prior for another model

I obtained posterior samples for a parameter of interest, \lambda, using Stan. I would like to use this posterior as a prior for another model. The distribution looks like a beta distribution so I wrote a Stan model to approximate the beta parameters for \lambda.

I could use this beta distribution as the prior for my next Stan model. For example,

lambda ~ beta(a,b)

where a and b are the posterior means or medians from the Stan model used to estimate the distribution of \lambda.

Instead of just passing in the posterior means or medians, could I pass in the posterior samples of a and b? That is,

data {
  ...
  real<lower=0> a[M];
  real<lower=0> b[M];
}
parameters{
  real<lower=0,upper=1> lambda;
}
model{
   ...
  lambda ~ beta(a,b);
}

I have ran this model and it runs fine. I just want to check it is doing what I think it is doing. I.e., my prior is a beta(a,b) and this includes the uncertainty in a and b in the analysis.

Edit: I think the answer to this question is no. I am not sure what inputting a[M] and b[M] does but it gives a drastically different posterior for \lambda compared to using the posterior medians (which gives the distribution that I was expecting).

I could be wrong, but I think that the issue with passing a[M] and b[M] is that the prior on lambda now has M contributions to the log posterior. You really want a single contribution to the log posterior that integrates over all the values of a and b. You could try something like

target += beta_lupdf(lambda | a, b) - log(M);

I haven’t tried this but I’m very curious if it would work.

1 Like

I tried this and it did not work. But, I don’t think we want to subtract \log(M). If Stan is using
p(\lambda \mid a_1, b_1) \times \dots p(\lambda \mid a_M, b_M) as the prior, I think we want to replace this with [p(\lambda \mid a_1, b_1) \times \dots p(\lambda \mid a_M, b_M)]^{1/M}. I tried the following prior

target+= (1/M)*beta_lupdf(lambda |a,b);

and it looks like this is correct or close to being correct but I am not certain.

Edit: I don’t think this is correct either. The density p(\lambda \mid \text{median}(a), \text{median}(b)) and p(\lambda) [I.e., integrating out a and b] are almost indistinguishable. But I get different results when using the median of a and b for the prior of \lambda compared to using all samples and writing

target+= (1/M)*beta_lupdf(lambda |a,b);

Given that the density p(\lambda \mid \text{median}(a), \text{median}(b)) and p(\lambda) are almost indistinguishable I could just use the medians but in general this may not be the case and using the median may result in losing some information from the prior. It would be nice to know if I can input a sample of a and b (rather than just a summary of them) and how the model needs to be adjusted for this.

Another edit: Maybe the adjustment:

target+= (1/M)*beta_lupdf(lambda |a,b);

is correct after all. I inputted M-vectors a = (\text{median}(a), \dots, \text{median}(a)) and b = (\text{median}(b), \dots, \text{median}(b)) and used the above adjustment and the posterior for \lambda is almost indistinguishable from the posterior for \lambda when using a = \text{median}(a) and b = \text{median}(b).

Stan’s not set up to do this in a single run.

Your expression lambda ~ beta(a, b) is equivalent to

for (m in 1:M) {
  lambda ~ beta(a[m], b[m]);
}

This is not what you want to propagate uncertainty. What you would want to do is run the whole Stan model for multiple values of a, b and then combine the results. That’s known as “multiple imputation” in the literature.

No. What you wrote here won’t do anything because log(M) is just a constant.

This will also not work because 1 and M are integers, so 1/M evaluates to 0. What’ll happen is that lambda will just be sampled from a uniform here assuming it’s properly constrained to (0, 1) in the declaration.

If you write 1.0 / M it will do what you expect (floating point arithmetic), but it’s not going to propagate uncertainty because you’re sampling lambda from all those densities.

There’s really not a good way to do what you’re trying to do in Stan.

2 Likes

Investigate mixture approximations. That’s the closest you’ll get. E.g. the RBesT had good implementations including for multivariate normal (remember a-posteriori some parameters will be correlated, even if they weren’t in your initial prior).

1 Like

See Robert Grant - stats
This works but the code is inefficient and alpha in R and still in progress in Stata/Mata. Thence to Rcpp next year. Also I have yet to do side-by-side comparisons with VAEs, GANs and all those popoular methods. It will be a helluiva lot easier than mixtures, where you have to assure yourself of the fidelity of the mixture. Apologies for a brief reply, but Email me if you want some pointers.

Robert

1 Like

You might be able to quickly and efficiently achieve the inference you want via importance sampling using the previous MCMC samples as your proposal distribution. See Updating model based on new data via PSIS

3 Likes