Nothing to add to the title. I know that normal priors always pull the maximum-likelihood estimate toward the prior mean at least to some degree, but is the broader, more sweeping statement of the thread title also true?
This might be too pedantic, but priors don’t affect the maximum likelihood estimate at all. However, they do lead to other estimators, including the posterior mean, the posterior median, the posterior mode (MAP estimate) and so forth. The answer to your question depends on exactly what quantity you are wondering about comparing to the maximum likelihood estimate.
Actually, sorry. Just realized that for none of these estimators is it the case that this estimator will necessarily be closer than the ML estimate to the prior mean.
I meant pulling the posterior mean away from the ML estimate and toward the prior mean, but should have been more explicit.
Is there an example of a scenario where the prior is non-uniform, and the posterior mean doesn’t end up somewhere between the MLE and the prior mean?
Sure. Suppose the likelihood is standard normal, and the prior is an equal-weight mixture of a normal centered at -1 and a normal centered at 100. The posterior mean will get pulled negative.
Wow, that’s one hell of a prior!
Can we at least trust that whenever the prior is unimodal, the posterior mean will end up somewhere between the MLE and the prior mean?
Nope. Suppose the prior is lognormal and the likelihood concentrates between its mean and its mode.
I think the property “the posterior mean will end up somewhere between the MLE and the prior mean” holds when the prior can be expressed as n_0 “fake” observations from the same distribution as the actual n observations. Then the posterior is a weighted average between prior mean (from fake data) and observed mean (from real data).
The universe of possibilities here is big, and makes statements like this one not correct in full generality, although they might be good heuristics for some set of real-world applications. Consider any prior and likelihood such that the posterior and prior means aren’t identical. We can always add a very narrow, finite spike in the likelihood function–narrow enough that the posterior is essentially unaffected–between the prior mean and the posterior mean, such that the distance between the prior and posterior means is larger than the distance between the ML estimate and the prior mean.
Actually, even more generally, note that the locations of the prior and posterior means are not invariant under reparameterization, but the location of the ML estimate is. So in general we can probably reparameterize just about any model to an equivalent model that places the ML estimate between the prior and posterior means.
Maybe this is the “solution”, as demoralizing as it may sound.
I don’t see why there would be such a reparameterization in general (eg. in more than one dimension). Reparameterization changes both the prior and the posterior. So it’s like one action having two related effects simultaneously and the prior/posterior may not change in a way that satisfies the “posterior estimate between prior and MLE” property any better. Your examples show the property doesn’t hold in general. And I was wondering what assumptions guarantee that it holds (hence the heuristic of “prior as additional observations”; I realized though that it’s a very restrictive condition as it assumes the same weight for all components of a multi-dimensional parameter).