# Model comparisons and point hypotheses

I have a quick general question about point hypotheses in models.

Say I have some predictor data vector x which predict multiple observed data y,z with measurement errors \epsilon,\delta which have some known distribution. The model has some shared parameters \theta between y and z and some parameters a,b,c,d which are not shared.

y_i=f(x_i,a,b,\theta)+\epsilon_i
z_i=g(x_i,c,d,\theta)+\delta_i

I want to know the value of P(a=c \cap b=d), i.e. the probability that the a and b parameters used to fit y are equal to the b and d parameters used to fit z. This is kind of like a point hypothesis in lower dimensional settings.

Is the only way of computing such a probability to use model comparison (as a model in which the restriction a=b \cap c=d applies is more parsimonious)? If so, what type of model comparison would be most recommended for a high dimensional problem.

Short answer: point hypothesis are hard to consolidate with the Bayesian paradigm, there are multiple ways to achieve something similar depending on your overarching goals.

Longer answer:
Some of the options you have:

Compare a simpler model to the full model: Separately fit a model with fewer parameters (where literally y_i=f(x_i,a,b,\theta)+\epsilon_i, z_i=g(x_i,a,b,\theta)+\delta_i). You can use the loo package to approximate comparison of predictive performance via leave-one-out crossvalidation. Alternatively you could use Bayes factors to do that, but those can be problematic, as they are very sensitive to the priors you use in your model. Some more interesting criticism by Danielle Navarro or Data Colada. (Disclaimer: I’ve never used bayes factors myself). Neither of those approaches will give you probability P(a=c \cap b=d), instead you get relative expected predictive performance (LOO) or relative KL-divergence (BF) of each models. Do you care about those?

Determine range of practical equivalence Strictly speaking P(a=c \cap b=d) = 0 for all continuous priors on a,b,c,d. And that makes sense - nature doesn’t like zeroes, most things have small and/or highly variable effects, but believing the mean effect is exactly zero makes IMHO little sense. But you can use domain expertise to say that e.g. a difference of 0.5 is practically irelevant. P(|a-c| < 0.5 \cap |b-d| < 0.5) can be computed directly from posterior samples. If you don’t want to put a strict threshold (which you IMHO shouldn’t), you can compute the probability for a range of thresholds. Or you can compute the posterior distribution of \max\{|a-c|, |b-d|\} or of (a-c)^2 + (b-d)^2 and just make decisions based on this.

Think qualitatively Danielle Navarro has a great essay about model selection and how purely mathematical approaches can fail us: Between the devil and the deep blue sea. Checking whether the models satisfy some qualitative properties can also be of interest.

Hope that makes sense.

1 Like