Bayesian Hypothesis testing

Hi, just a few more thoughts on this, as I’ve just written an answer on a similar topic.

The elephant in the room: All the approaches are conditional on the model(s) you use being 100% correct (including priors etc.). Since (except some parts of physics) your model is basically never even roughly correct, model comparison/selection/hypothesis testing - whatever you call it - can be very brittle. Various approaches fail in different contexts, but there is no general way to do this “best”. You need to be roughly correct about the “important” things, but what is “important” is not fixed. A huge question is what your actual final goal is.

Some of the options you have:

Determine range of practical equivalence (Expanding on what @JimBob wrote). Strictly speaking P(\beta_1 = 0) = 0 for all continuous priors on \beta_1 (and we didn’t even start with combining multiple parameters). And that makes sense - nature doesn’t like zeroes, most things have small and/or highly variable effects, but believing the mean effect is exactly zero makes IMHO little sense. But you can use domain expertise to say that e.g. a difference of 0.5 is practically irelevant. P(|\beta_1| < 0.5) and by extension P(|\beta_1| < 0.5 \cap |\beta_2| < 0.5 \cap ... \cap | \beta_n | < 0.5) can be computed directly from posterior samples (but the probability will shrink hugely as you add more variables). If you don’t want to put a strict threshold (which you IMHO shouldn’t), you can compute the probability for a range of thresholds. Or you can compute the posterior distribution of \max_i{|\beta_i|} or of \sum_i \beta_i^2 and just make decisions based on this.

Compare a simpler model to the full model: Separately fit a model with fewer parameters, omitting some of the effects. You can use the loo package to approximate comparison of predictive performance via leave-one-out crossvalidation. Alternatively you could use Bayes factors to do that, but those can be problematic, as they are very sensitive to the priors you use in your model. Some more interesting criticism by Danielle Navarro or Data Colada. (Disclaimer: I’ve never used bayes factors myself). You get relative expected predictive performance (LOO) or improvement in relative KL-divergence to the true process (BF) of each models. Do you care about those?

Think qualitatively Danielle Navarro has a great essay about model selection and how purely mathematical approaches can fail us: Between the devil and the deep blue sea. Checking whether the models satisfy some qualitative properties can also be of interest.

Hope that makes sense and is at least a bit helpful.

4 Likes