Where did loo difference estimates go?

Hi @paul.buerkner and friends,

With the brms version 2.8.0 update, is it no longer possible to get a loo difference estimate for two models? I’ve been playing around with loo(), add_criterion() and loo_compare() and the closest I’m getting is elpd_diff. Same deal for waic.

I miss my loo difference estimates.

Wistfully,
Solomon

@avehtari didn’t like them anymore ;-)

To bring them back use

print(loo_compare(...), simplify = FALSE)

You also see that the entire matrix is actually returned if you look at str(loo_compare(...)).

I guess you are looking for something else as elpd_diff is the loo difference.

Thank you for chiming in, @avehtari . I don’t think I follow. The elpd_diff appears to be the difference between the two elpd_loo estimates. In my case, that’s -116.1 - -111.8 = -4.3 That difference estimate is clearly in a different metric than what one might get from the looic estimates, in my case 232.1 - 223.5 = 8.6. So if you’re calling the elpd_diff the “loo difference”, then yes, I’m looking for something else, namely, the difference score in the metric of the loo estimates, which appears to be the looic.

Thanks for the direction, @paul.buerkner . When I use the print(loo_compare(...), simplify = FALSE) method, I see that I get a wealth of output, including the looic estimates and their standard errors. I can manually compute a difference score for their estimates, but don’t see a clear way to get the standard error for that difference. What am I missing?

multiply elpd_diff by -2 and the corresponding standard error by 2.

2 Likes

That’s easy enough. Thanks, @paul.buerkner.

I understand these changes in the brms output are in accordance with @avehtari’s recommendations, which seems like a good thing. But from a user perspective, the change seems abrupt. Are they spelled out somewhere in the brms documentation?

The changes primarily happend in loo not in brms.

Is there a place in the loo documentation that the brms documentation might direct brms users to in order to help them understand the changes?

Not sure. Maybe @jonah knows.

Okay. Thank you for the networking.

I’ll just add that the problem with the previous output was that there were values which were exactly the same except for the multiplier -2. So the output had more to read, but no actual additional information. As we wanted to add more useful information, we decided to drop columns multiplied by -2. At the same time there has been attempts to make brms, rstanarm and rstan loo outputs to be consistent. Sorry for not having enough communication. I have been waiting new rstanarm version, before advertising new loo, but new brms release came first. I don’t personally like multiplying log densities by -2 as it is arbitrary in Bayesian context (it used to make some sense for Gaussian models and maximum likelihood).

That all makes sense. Thank you for the clarification.

If no one minds, I’d like to share a link to this exchange on twitter. It might help other confused users.

Sure, go ahead. This is public forum and great if you find something useful to share. loo 2.1.0 was recently released, and NEWS mention several changes, but unfortunately it doesn’t explicitly mentioned that something was removed from the default output.

1 Like

Thanks Aki for providing the explanation! Sorry, @Solomon for my brief and barely helpful answers today, I just sent them in the middle of some talks. Perhaps we could add Aki’s explanation somewhere in the documentation so the users can understand the history of our changes.

I had to release brms earlier because it was failing in the R-Devel versions on CRAN and they required me to do a new release.

No problem, @paul.buerkner. We all have day jobs.

I think linking to one of @avehtari’s talks is a great idea. I’ve used the link to his GPSS2017 workshop: On Bayesian model selection and model averaging presentation, before. Would the Model assessment, selection and averaging talk from November 2018 be a better choice?

1 Like

Much better as it’s more up to date

2 Likes

Now I’ve had some to play with the new brms/loo workflow, it might be helpful to others to see what it can look like in practice. Here I’ll compare fit1 and fit2 from @avehtari and @jonah’s vignette Bayesian Stacking and Pseudo-BMA weights using the loo package, which are themselves based on models from McElreath’s Statistical Rethinking, Chapter 6.

First, load the packages and format the data.

library(tidyverse)
library(rethinking)
library(brms)
data(milk)

milk <- 
  milk %>%
  drop_na(ends_with("_s")) %>%
  mutate(neocortex = neocortex.perc / 100)

Fit the two models.

fit1 <- 
  brm(data = milk, family = gaussian,
      kcal.per.g ~ 1,
      seed = 1)

fit2 <- 
  brm(data = milk, family = gaussian,
      kcal.per.g ~ 1 + neocortex,
      seed = 1)

Here we’ll compute the LOO and WAIC for both models and them save the results with the brmfit objects.

fit1 <- add_criterion(fit1, c("waic", "loo"))
fit2 <- add_criterion(fit2, c("waic", "loo"))

You can access the results like this.

fit1$loo
Computed from 4000 by 17 log-likelihood matrix

         Estimate  SE
elpd_loo      4.4 1.9
p_loo         1.3 0.3
looic        -8.8 3.7
------
Monte Carlo SE of elpd_loo is 0.0.

All Pareto k estimates are good (k < 0.5).
See help('pareto-k-diagnostic') for details.
fit2$waic
Computed from 4000 by 17 log-likelihood matrix

          Estimate  SE
elpd_waic      3.6 1.6
p_waic         1.9 0.3
waic          -7.2 3.2

To compare models by their information criteria, use the loo_compare() function. Since it’ll come in handy in a bit, we’ll first save the results as ic.

ic <- loo_compare(fit1, fit2, criterion = "waic")

Here’s the default output.

print(ic)
     elpd_diff se_diff
fit1  0.0       0.0   
fit2 -0.7       0.6  

However, there’s more information available.

print(ic, simplify = FALSE)
     elpd_diff se_diff elpd_waic se_elpd_waic p_waic se_p_waic waic se_waic
fit1  0.0       0.0     4.4       1.9          1.3    0.3      -8.7  3.7   
fit2 -0.7       0.6     3.6       1.6          1.9    0.3      -7.3  3.2 

Even with all that output, you’ll notice that the difference score and its standard error are in the elpd metric, rather than the conventional information criteria metric. Here’s a quick way to do the conversion.

cbind(waic_diff = ic[, 1] * -2,
      se        = ic[, 2] *  2)
     waic_diff       se
fit1  0.000000 0.000000
fit2  1.416582 1.153017
1 Like