# Interpreting PSIS-LOO weights (in relation to Bayes theorem?)

I’ve been using Stan to try to compare three different theoretical models (which I am differentiating by having different priors around some parameters, as shown at the bottom of this post). I thought I had a handle on what I was doing but I’m now realising I’m a bit confused about how to interpret results. At the moment I am calculating PSIS-LOO weights to compare some models. My confusion is mainly around the interpretation of these weights and whether they are really what I want to report.

What I have done is fit three models (H_1, H_2 \ \& \ H_{null}; I give an overview of these models at the bottom of this post for context), then used the loo package in R to generate elpd_loo values for each model and then calculated Akaike weights as follows:

loo_weights <- exp( c(elpd1, elpd2, elpdNull) ) / sum(exp( c(elpd1, elpd2, elpdNull) ))


As I understand it, these weights can be roughly interpreted as the relative probability of each model being the best predictor of expected out-of-sample data. E.g Page 199 of Statistical Rethinking quotes Akaike’s interpretation that:

A model’s weight is an estimate of the probability that the model will make the
best predictions on new data, conditional on the set of models considered.

My confusion is mainly how this interpretation maps onto Bayes’ Theorem. For example, if I took the ratio of the weights for H_1 and H_2 would this be somewhat equivalent to the posterior ratio:

\frac{p(H_1| data)}{p(H_2| data)} = \frac{p(data|H_1)}{p(data|H_2)}\times \frac{p(H_1)}{p(H_2)} ?

Or, perhaps more accurately, given I have expected lpd rather than lpd:

\frac{p(H_1| \textrm{expected new data})}{p(H_2| \textrm{expected new data})} = \frac{p(\textrm{expected new data}|H_1)}{p(\textrm{expected new data}|H_2)}\times \frac{p(H_1)}{p(H_2)}

If so, what would the values of p(H_1) and p(H_2) be in this context, given I have only specified different priors around the parameters for each model and have not explicitly said something like p(H_1)=0.3, p(H_2)=0.5 and p(H_{null})=0.2?

If not, is there some alternative interpretation of these weights in terms of Bayes Theorem that I am missing. More generally, if I am interested in the relative probability of these models, does calculating PSIS-LOO weights seem reasonable here or would some alternative approach (perhaps using the bridgesampling package or changing the model specifications entirely) be more appropriate?

.
.

Following is a simplified summary of the models I am fitting to provide context to the above question

I have three theories (H_1, H_2 \ \& \ H_{null}). I can represent the predictions of these theories using ordinal logistic regressions with different priors on the parameters. E.g. H_1 says that a particular parameter tends to be positive, H_2 says that that parameter will be negative and H_{null} simply says that every parameter is 0. My hope is to arrive at some estimate of the relative probability of each of these theories. For example, let’s say H_1 was as follows:

parameters{
real<lower=0> b1;
real b2;
real<lower=0> b3;
ordered[5] cutpoints;
}
model{
vector[N] phi;
cutpoints ~ normal( 0 , 10 );
b1 ~ normal( 0 , 1);
b2 ~ normal( 0 , 1 );
b3 ~ normal( 1 , 1 );
for ( i in 1:N ) {
phi[i] = b1*X1[i] + b2*X2[i] + b3*X1[i]*X2[i];
}
for ( i in 1:N ){
response[i] ~ ordered_logistic( phi[i] , cutpoints );
}
}


In contrast, the predictions of theory two (H_2) can be represented by the following changes to the priors in the model:

parameters{
real<lower=0> b1;
real b2;
real<upper=0> b3; //This has changed from H_1
ordered[5] cutpoints;
}
model{
vector[N] phi;
cutpoints ~ normal( 0 , 10 );
b1 ~ normal( 0 , 1);
b2 ~ normal( 0 , 1 );
b3 ~ normal( 0 , 1 ); //This has also changed from H_1
...
}


H_{null} simply says that b1=b2=b3=0

Don’t do that; use loo_model_weights.

They don’t. They are just weights that are expected to yield the best predictions of future data. The idea of putting probabilities on models is essentially about which model’s prior predictive distribution lines up with the past data that you conditioned on. To do that, you have to make a strong assumption about one of the models being the right model (or very close), you have to have informative (and I would think preregistered priors), and to use the functions in the bridgesampling package.

Asymptotically with n \to \infty and the number of parameters p fixed, it will approach it, but we don’t recommend this in finite n case.

It’s better to think Akaike type weights in view of predictive performance as discussed in Using stacking to average Bayesian predictive distributions. See also a related vignette.

Thanks for the quick responses @bgoodri and @avehtari. This has clarified things for me a lot.