There is a new loo package CRAN release v2.10.0 with the most important new feature being new loo_compare() output with columns p_worse, diag_diff and diag_elpd, based on paper “Uncertainty in Bayesian leave-one-out cross-validation based model comparison”. In addition the returned object is now a data.frame instead of a matrix.
This is great, Aki! I have a question about p_worse. I fit two models for a compositional predictor, one without intercept and free coefficients, and one with intercept and sum-to-zero coefficients. The predictions from the model are essentially identical, but I get p_worse = 1. Why is that?
> loo_compare(loos)
model elpd_diff se_diff p_worse diag_diff diag_elpd
zero_sum 0.0 0.0 NA
drop_intercept -0.1 0.0 1.00 |elpd_diff| < 4
Good question! Even though elpd_diff is very small here, if se_diff is small enough then the probability can end up being close to 1. In the doc, we say something about being cautious with p_worse when you see the |elpd_diff| < 4 diagnostic (which you have here in the diag_diff column):