Marginal Estimates


When obtaining point estimates and also credible intervals of parameters in the MCMC-setup, the most common approach is to do this marginally (using just the samples of one parameter, which corresponds to integrating the posterior).

However, another approach could be to jointly maximize the posterior.

Of course, it is much easier to maximize the marginal densities but is there some theory on why this makes sense and whether the joint or the marginal approach leads to better results?

Thank you!

The typical point estimate for a parameter in Bayesian statistics is the posterior mean, because if the model is correct, that minimizes expected square error in the estimate. The median minimizes expected absolute error. The posterior mean is the same whether you do it one parameter at a time or with all the parameters together.

For finding a maximum likelihood or max a posterior estimate, we take the joint optimum. That is, for the MAP estimate, we take

\theta^* = \textrm{argmax}_\theta \ p(\theta \mid y).

This jointly optimizes the values in \theta.

You cannot derive an MLE or MAP estimate by running MCMC, though I’ve seen people in applied fields try to take the maximum of the sampled posterior. This doesn’t work. MCMC will never get anywhere near maximum values—it samples from the high probability region of the posterior, not near the mode. You can see this by sampling a high-dimensional standard normal. I demonstrate the idea in my Getting started with Stan tutorial and in my case study on the curse of dimensionality and typical sets.

1 Like