I understand this is a somewhat general question but it bothers me recently. My use case is concerned about the variance of the posterior estimates, represented as the width of 90% credible interval. The expectation is with more data (increase the number of samples), the posterior variance should get smaller. I am wondering what factors contribute to how fast/slow the posterior variance decreases over increased sample size, especially in complex hierarchical models where analytical solution is not possible (that’s why we use Stan). Initially I thought I can make the posterior variance decreases faster with improvements on priors. For example, a more informative prior so the posterior estimates is constrained by it, or on the other hand, a weak or non-informative prior so it can be easily overcome by the likelihood. But the changes I tried on my model only impacts when the sample size is extremely small, then the decrease speed of the posterior variance does not respond to the prior changes.
I am wondering if someone experienced a similar issue that they need to achieve a smaller posterior variance with a small sample size. Is there anyway to establish an expected speed or fastest speed that the posterior variance will decrease with sample size? I think this will also help the sample size estimation problem for Bayesian data analysis.
You could try running more warm up iterations, or just more iterations generally speaking. I’d guess that would lead to more posterior samples closer to convergence.
Thank you. This does not work in my case as the MCMC chains are converged. What I am trying to achieve is to reduce the variance of posterior estimates given a fixed sample size. I assume it is going to involve some change on the model itself, but am looking for some guidance on that.
Can you share your model and/or some data? It’s tricky to think about what might reduce the variance in a model agnostic way.
To clarify, chains can take a while to converge; if you’re collecting posterior samples on the way to convergence then your posterior variance could be larger than it needs to be. So if you do more warm-up iterations, it’s more likely that you’ll be sampling at convergence rather than on the way to it, if that makes sense? I guess this will more likely be the case for weird or complicated models.
Can I ask you what do you assume to happen for your posterior variance when your sample size is 100, 1000, 10000, 1000000, …?
Also, are you talking about posterior variance or se (mcse) for specific estimate (e.g. for posterior mean)
edit. What I’m trying ask here, if your system has some specified noise level, how mcmc could get better than this (for posterior predictive values). Hard to say without the model what are you trying make smaller.
I believe you are trying to figure out how to get more precise estimates (narrower credible intervals) of unknown quantities without having more data to estimate from?
There is no general way to do this - there is only so much information contained in any dataset. In most cases there is no magic to extract more information.
If you have very good theory to inform your modelling, using it could sometimes help. For example, if there is a very good predictor left out, adding it to the model should narrow your credible intervals. Similarly, any additional structure of the actual problem could (and could not) narrow posterior credible intervals when incorporated into the model (e. g. if there is spatial structure or you know that the relationship is quadratic in some predictors, etc.).
This is unsurprising to me - as you have more data, the prior influences the posterior less and less - in the limit of infinite data, the prior is completely irrelevant to posterior (there are some caveats to this, but let’s ignore them for now). I would expect the prior to still matter even with large data if you make it very narrow, but this is also likely to make sampling difficult for Stan.