I just got this silly statistics question in my mind and could you please help to clarify? I guess it is NOT legitimate to subtract two posteriors to obtain the distribution of the difference between them if they are not sampled in the same model. Then is there any way to get the distribution of the difference, except for modeling them in one model (that’s why I find the question is kind of silly)? If not, in this case, how can I estimate for instance the mean and variance of the difference? If I assume that they are independent, I guess I could calculate the difference between two means and the variance would be sqrt(var1 + var2)? Thanks a lot.

If the two posteriors are genuinely independent (here I mean that the joint posterior for the parameters conditioned on the fixed data does not involve any dependency between the parameters in the first posterior and the parameters in the second posterior), then the correct Bayesian computation is to subtract one from the other iteration-wise.

The intuition here is that if the two posteriors are independent, then it makes no difference whether you sample the two models within a single Stan program that explicitly represents the joint distribution of all of the parameters, or whether you split them apart into separate Stan programs. In general, when you don’t need the parameters to talk to one another at all, it is useful to split them up into separate Stan programs, so that it will be simple to locate the source of any problems during fitting (i.e. problems that indicate lack of convergence). The main reason for putting the parameters together would be if you want the flexibility to let them talk to each other, for example via shared covariance parameters in a multivariate random effect.

Thank you @jsocolar. I totally agree with your two points. However, my question was about is it possible to get the distribution of the difference between two parameters that are sampled separately and assumed to be independent? As you mentioned, after fitting the model, we could conduct posterior predictive simulations for any kind of combination of parameters since they are sampled iteration-wise from the joint distribution. Then what if they are not sampled together, due to practical reasons or we really believe some are independent to the others.

Again, as long as there is no dependency between the parameters conditional on the prior and data, then it makes no difference whether you sample them “at once” or separately. As long as the “correct” joint posterior has complete independence between these parameters, then fitting them separately still yields the correct joint posterior.

Sorry, perhaps I’m misunderstanding, but what does the difference between the two posteriors even signify?

You can use it to calculate the difference in the mean of a parameter (or any function of the parameters) under the two different models, but that would just be a shortcut for calculating \left< f | M_1\right> - \left< f | M_2 \right>, which might be interesting but I don’t think is what you’re asking about, which mentions the difference between two different parameters. (Slogan: “the difference of the distributions is not the distribution of the difference”.)

I think that the OP uses “subtract two posteriors” to mean “subtract some quantity derived from posterior 2 from some other quantity derived from posterior 1, iteration-wise”. This exactly the right way to calculate the distribution of the difference. I’m not sure exactly what you mean by “the difference of the distributions” in your slogan, but of course you’re right than anything other than (something equivalent to) subtracting the samples iteration-wise won’t yield the correct distribution of the difference.