Do non-uniform priors always shrink the ML estimate towards the prior mean?

blokeman · August 23, 2023, 8:06am

Nothing to add to the title. I know that normal priors always pull the maximum-likelihood estimate toward the prior mean at least to some degree, but is the broader, more sweeping statement of the thread title also true?

jsocolar · August 23, 2023, 1:23pm

This might be too pedantic, but priors don’t affect the maximum likelihood estimate at all. However, they do lead to other estimators, including the posterior mean, the posterior median, the posterior mode (MAP estimate) and so forth. The answer to your question depends on exactly what quantity you are wondering about comparing to the maximum likelihood estimate.

jsocolar · August 23, 2023, 1:28pm

Actually, sorry. Just realized that for none of these estimators is it the case that this estimator will necessarily be closer than the ML estimate to the prior mean.

blokeman · August 23, 2023, 3:12pm

I meant pulling the posterior mean away from the ML estimate and toward the prior mean, but should have been more explicit.

Is there an example of a scenario where the prior is non-uniform, and the posterior mean doesn’t end up somewhere between the MLE and the prior mean?

jsocolar · August 23, 2023, 9:37pm

Sure. Suppose the likelihood is standard normal, and the prior is an equal-weight mixture of a normal centered at -1 and a normal centered at 100. The posterior mean will get pulled negative.

blokeman · August 24, 2023, 3:54pm

Wow, that’s one hell of a prior!

Can we at least trust that whenever the prior is unimodal, the posterior mean will end up somewhere between the MLE and the prior mean?

jsocolar · August 24, 2023, 3:59pm

Nope. Suppose the prior is lognormal and the likelihood concentrates between its mean and its mode.

desislava · August 24, 2023, 5:05pm

I think the property “the posterior mean will end up somewhere between the MLE and the prior mean” holds when the prior can be expressed as n_0 “fake” observations from the same distribution as the actual n observations. Then the posterior is a weighted average between prior mean (from fake data) and observed mean (from real data).

jsocolar · August 24, 2023, 7:31pm

The universe of possibilities here is big, and makes statements like this one not correct in full generality, although they might be good heuristics for some set of real-world applications. Consider any prior and likelihood such that the posterior and prior means aren’t identical. We can always add a very narrow, finite spike in the likelihood function–narrow enough that the posterior is essentially unaffected–between the prior mean and the posterior mean, such that the distance between the prior and posterior means is larger than the distance between the ML estimate and the prior mean.

jsocolar · August 24, 2023, 7:36pm

Actually, even more generally, note that the locations of the prior and posterior means are not invariant under reparameterization, but the location of the ML estimate is. So in general we can probably reparameterize just about any model to an equivalent model that places the ML estimate between the prior and posterior means.

blokeman · August 25, 2023, 4:33am

Maybe this is the “solution”, as demoralizing as it may sound.

desislava · August 25, 2023, 6:31am

I don’t see why there would be such a reparameterization in general (eg. in more than one dimension). Reparameterization changes both the prior and the posterior. So it’s like one action having two related effects simultaneously and the prior/posterior may not change in a way that satisfies the “posterior estimate between prior and MLE” property any better. Your examples show the property doesn’t hold in general. And I was wondering what assumptions guarantee that it holds (hence the heuristic of “prior as additional observations”; I realized though that it’s a very restrictive condition as it assumes the same weight for all components of a multi-dimensional parameter).

Topic		Replies	Views
Posterior for noninformative prior General	2	463	October 28, 2020
Stan's default uniform priors Modeling	7	4118	May 13, 2018
Comparing the posterior with the prior distribution RStan rstan , techniques	1	1072	July 22, 2020
Hierarchical models not having a posterior mode General	2	552	June 13, 2022
Brms, priors for random-effect SDs, and non-centered parameterizations Modeling brms	29	2082	August 31, 2023

Do non-uniform priors always shrink the ML estimate towards the prior mean?

Related topics