Hello. I am an ecologist trying to get to grips with Bayesian model comparison for an analysis, but I’m finding it quite difficult to decide on the appropriate method for my needs.

I was wondering if you would be willing to offer some advice if I outline our aims and the methods I have tried so far?

Our overall question is whether species traits can predict the global impact of pathogen species within a genus of plant pathogens (Phytophthora).

Within this analysis, I would also like to ask whether using trait syndromes as predictors (e.g. some axes of trait-space capturing important covariance among traits) can outperform models with individual traits. The key point here is that I want to compare both nested and non-nested models within the same framework.

I currently have a total of 659 candidate models including all possible subsets of predictors (and including models with individual traits or trait syndromes, as predictors) and I would like to rank these models based on their predictive performance. The models were fitted in R using Paul Buerkner’s brms package as an interface to stan. The models have a negative binomial response (impact). They include phylogenetically correlated random intercepts to account for non-independence due to phylogenetic relatedness among species. The observations are also weighted. This is to account for the fact that our knowledge about species impacts tends to accumulate over time and is usually less reliable for more recently described species. Therefore I wanted to place greater weight on observations for species that we have known about for longer.

Initially I followed the guidance in this paper https://arxiv.org/abs/1507.04544 to use PSIS-LOO, but the Pareto diagnostics suggested the LOO-CV approximations wouldn’t be robust for a lot of my models. I then used WAIC to rank these models, but your paper here https://arxiv.org/abs/1503.08650 suggested this could introduce bias when many models are compared. I was now thinking of using Bayes factors to compare models instead. What would you recommend? Given the structure of my models (weighting and a correlated random effect), are there any issues I should be aware of when choosing and interpreting these model comparison metrics?

I would be happy to provide any additional information about the models if that would be helpful?

Best wishes and thank you in advance.