Newbie questions about Bayes factors

blokeman · August 5, 2023, 8:29pm

I just heard about Bayes factors for the first time. Why is it such a big deal having to compare the posterior probabilities of two models? Isn’t a model’s posterior probability simply exp(elpd_loo+p_loo)? Sounds easy to compute a ratio, no? Or are the values simply too small?

Also, why does the Wikipedia article say “an advantage of the use of Bayes factors is that it automatically, and quite naturally, includes a penalty for including too much model structure”? What penalty? I don’t see any penalty in the quotient of two probabilities. Is it referring to priors? But what if your priors are flat? Doesn’t the “penalty” then disappear?

Curious newbies yearn to know…

jsocolar · August 6, 2023, 12:54am

The numerator and denominator in the bayes factor are the likelihoods integrated over the prior (NOT the likelihood integrated over the posterior). See equation 2 here https://www.andrew.cmu.edu/user/kk3n/simplicity/KassRaftery1995.pdf

If an unbounded parameter gets a flat prior, that prior is improper and the density is everywhere zero, so such a model doesn’t work with Bayes factors. If we take a proper prior and take a limit as the prior gets flatter and flatter, the prior density gets minuscule everywhere and the model gets penalized into oblivion. So if the priors are flat, the penalty does the opposite of disappearing. If the priors are very narrow, then the penalty gets smaller, which should be expected, as the model has less, in effect, a less flexible structure.

The tricky part here in the notation of Wikipedia and of Kass & Rafferty is to realize that when they write (here following the Wikipedia notation) \textrm{Pr}(D|M_1) they are not referring to the likelihood for some specific (e.g. draw-wise) value of the model parameters \theta_1 (i.e. \textrm{Pr}(D|\hat{\theta}_1)), nor to this likelihood integrated over the posterior for \theta_1, but rather to the likelihood integrated over the prior. Here M_1 refers to the entire Bayesian model 1, including its prior.

blokeman · August 6, 2023, 10:39am

Wow, thanks!

Doesn’t all of this imply an extremely severe penalty for every added parameter? Isn’t it the case that every added parameter expands the parameter space exponentially, so that the “total probability mass” of 1 is spread exponentially thinner, resulting in much lower marginal probabilities unless the added parameter makes an earth-shattering predictive contribution?

And doesn’t it also imply that even with just a single parameter to estimate, any kind of prior (say, normal or uniform) that is symmetrically centered on the true value will result in a lower marginal probability the wider we allow its spread to be? E.g. if we’re estimating a binomial p whose true value happens to be 0.5, a model with a U(0.3, 0.7) prior on p will have a higher marginal probability than an otherwise identical model where the prior is U(0.1, 0.9)?

jsocolar · August 6, 2023, 4:08pm

Yes, if you widen the priors, then assuming that the likelihood gets lower in the tails of the priors the Bayes factor penalizes this widening. This is entirely consistent with the idea of penalizing model complexity/flexibility, because wider priors equal more flexibility. If you are willing to assume that the likelihood gets lower in the tails of your priors (and you think working with bayes factors is a good idea), then you shouldn’t be widening your priors.

Whether adding a parameter results in a penalty (and preference for the simpler model) depends on the competition between two things:

Does the inclusion of the parameter yield higher likelihoods near the best fitting value?
Does the prior encompass regions of parameter space over the new parameter where the likelihood is lower than it would be without including that parameter.

For example, if I add a parameter that yields modestly larger likelihoods everywhere in parameter space that is consistent with the prior that I place on it, then there will be no penalty even though the increase in the likelihood is modest. What matters is the change in the likelihood integrated (i.e. averaged) over the prior.

blokeman · August 8, 2023, 4:33am

Thanks for coming to the rescue once more, Jacob.

Topic		Replies	Views
Biased Bayes Factors and factors with 3+ levels Modeling bayes-factor	4	1049	December 13, 2019
Priors for a novice General	27	3868	October 25, 2019
Two basic but important questions for a Bayesian newcomer brms specification	4	694	May 11, 2021
Estimation vs Bayes Factors General bayes-factor	5	758	September 9, 2018
Weight of prior in determining the posterior General	1	342	March 12, 2024

Newbie questions about Bayes factors

Related topics