Model selection with loo and bridge sampling

Hi, sorry for not getting to you earlier, this is relevant question.

A very rough rule of thumb is that a elpd_diff larger than 2 * se_diff is bigger than most of the noise we have in evaluting elpd (which could mean that it is large or that there is little noise). Agree that this looks like not very big differences, but I find it more useful to use the model weights (as you did).

The fact that you get different results with loo and with bridgesampling is not surprising, both answer quite different questions (my current best thoughts on this are at Hypothesis testing, model selection, model comparison - some thoughts ). In particular, Bayes factors do weird stuff when none of your models is a good fit for the data, while loo is mostly robust to this.

The loo results you see are indeed indicative of no of the models working much better than others in leave-one-out cross validation (as reflected by the model weights). Averaging over the models makes sense if your goal is out of sample prediction.

I don’t think so - note that those weights do not average over parameters, they average over predictions - I would expect many parameters are not even shared by the individual models, so I don’t think you could even meaningfully define what would be the expected behaviour. What you can do is to make predictions from the “ensemble” model and interpret those (e.g. how big change does the model predict for reassigning all the subjects to one of the treatments group).

Hope that clarifies more than confuses.

Best of luck with your modelling.