Model comparisons and point hypotheses

Short answer: point hypothesis are hard to consolidate with the Bayesian paradigm, there are multiple ways to achieve something similar depending on your overarching goals.

Longer answer:
Some of the options you have:

Compare a simpler model to the full model: Separately fit a model with fewer parameters (where literally y_i=f(x_i,a,b,\theta)+\epsilon_i, z_i=g(x_i,a,b,\theta)+\delta_i). You can use the loo package to approximate comparison of predictive performance via leave-one-out crossvalidation. Alternatively you could use Bayes factors to do that, but those can be problematic, as they are very sensitive to the priors you use in your model. Some more interesting criticism by Danielle Navarro or Data Colada. (Disclaimer: I’ve never used bayes factors myself). Neither of those approaches will give you probability P(a=c \cap b=d), instead you get relative expected predictive performance (LOO) or relative KL-divergence (BF) of each models. Do you care about those?

Determine range of practical equivalence Strictly speaking P(a=c \cap b=d) = 0 for all continuous priors on a,b,c,d. And that makes sense - nature doesn’t like zeroes, most things have small and/or highly variable effects, but believing the mean effect is exactly zero makes IMHO little sense. But you can use domain expertise to say that e.g. a difference of 0.5 is practically irelevant. P(|a-c| < 0.5 \cap |b-d| < 0.5) can be computed directly from posterior samples. If you don’t want to put a strict threshold (which you IMHO shouldn’t), you can compute the probability for a range of thresholds. Or you can compute the posterior distribution of \max\{|a-c|, |b-d|\} or of (a-c)^2 + (b-d)^2 and just make decisions based on this.

Think qualitatively Danielle Navarro has a great essay about model selection and how purely mathematical approaches can fail us: Between the devil and the deep blue sea. Checking whether the models satisfy some qualitative properties can also be of interest.

Hope that makes sense.

1 Like