I have run a few mixed-effects models in brms with 3 categorical predictors and one continuous. I have used loo to compare main effects and interaction modes with and without some of the predictors. I’m not sure how to interpret the results.
elpd_diff se_diff model1 0.0 0.0 model2 -1.1 2.2 model3 -1.1 2.2 model4 -1.2 3.1 model5 -1.2 2.6 model6 -2.1 2.8
For me, it is hard to get an intuitive sense of what constitutes a large elpd_diff and se_diff. It seems to me that there are no huge differences between the predictive validity of the models. To confirm, I used bridge sampling to compare model1 vs. the other models. BFs indicated that model3 was ~14 in favor of model1. The other models were either weakly in favor or inconclusive compared to model1, except model5, where a BF of ~1000 preferred model1.
My next idea was to average over the models. I used the loo_model_weights function with the pseudobma method to obtain model weights.
Method: pseudo-BMA+ with Bayesian bootstrap ------ weight model1 0.341 model2 0.123 model3 0.131 model4 0.161 model5 0.119 model6 0.124
Considering these results, it seems to me that none of the models clearly fit the data “best.” Intuitively I think the best approach would be to average over all of the models. Does this sound sensible?
Is it possible to use the hypothesis function to perform testing on the averaged parameters?