# Quick examples of loo() interpretation

As a newbie, loo() comparisons cause some confusion. Could you please have a quick comment?

Example 1: m2 is better since elpd_diff is >3 times higher compared to se_diff? Or the difference is insignificant due to so small numbers?

``````Model comparisons:
elpd_diff se_diff
m1                     0.0     0.0
m2                    -0.5     0.1
``````

Example 2: m2 is worse as elpd_diff is >3 times higher compared to se_diff

``````Model comparisons:
elpd_diff se_diff
m1                    0.0       0.0
m2                   -15.5     5.1
``````

Example 3: models are equal as elpd_diff is not 3-5 times larger compared to se_diff? But what about so large se_diff?

``````Model comparisons:
elpd_diff se_diff
m1                    0.0       0.0
m2                   -0.3     182.4``````

Hi di4oi4,

In this case, m1 is actually the better model, as, what you say is correct: it outperforms m2 by more than 3 times the `se_diff`, if this is what you set as your criterion for calling a model â€śbetterâ€ť. That the numbers are small does not matter, as the absolute values of `elpd` are not meaningful in itself and similarly, we cannot judge by merely looking at `elpd_diff` to see whether it is â€śsmallâ€ť or â€ślargeâ€ť. This is why we need to consider `se_diff`, just as you already suggested.

EDIT: for people checking up on this in the future, @avehtari actually pointed out (below) that this difference can in fact be considered insignificant due to small numbers and that it is possible to interpret absolute numbers. I leave the point above to not break the conversation-flow but want to point out that what I was saying was not 100% correct.

yes this is correct.

Yes your interpretation is correct again. Did you maybe fit different models / data here? `elpd_diff` being so small while `se_diff` being so large could indicate that the two models fit the data equally well and that the absolute values of `elpd` are large, and so are their standard-deviations and hence is `se_diff`. Again, this is not by itself meaningful, as the absolute value of `elpd` does not tell us much without comparing it to something.

If you havenâ€™t done so already, consider having a look at the loo glossary for more information about `elpd`.

2 Likes

Thank you so much! This is very useful information!

Comment about the last example (3). I got so high se_diff due to different model specification:

m1 = y ~ predictor + country
m2 = y ~ predictor + (1 | country)

Predictions of the two models were quite the same, also pp_checks and Rhats. Only the hierarchical model had slightly higher CIs in predictions.

Sorry I was not very clear in my wording above: I meant â€śdifferent models / dataâ€ť compared to example 1 and 2. Because, for different data or response variables the values for elpd can drastically change which would explain why the se_diff (and `elpd`?) might be so much bigger here compared to example 1 or 2â€¦
Either way, your interpretations are correct I think :)

1 Like

The difference is insignificant due to the small numbers.

Yes.

Likely to have model mis-specification. Do posterior predictive checking.

4 Likes

Thanks for correcting this, I added an EDIT note to my answer above. Just for my own understanding, because I find this surprising: When would you consider `elpd_diff` values small/large then, as from my understanding, as I mentioned above, I thought that the absolute value does not matter?

See answers 11 and 15 in CV-FAQ. TL;DR the absolute difference has an interpretation.

1 Like

I did pp_checks. All except the last look similar.
Column 1: m2 = y ~ predictor + (1 | country)
Column 2: m1 = y ~ predictor + country

Why y is different in the last row?
The third row looks a bit suspicious although not completely infeasible.
Are you using just Gaussian model? What is y? Counts with some 0â€™s, too?

I donâ€™t know why y is different in the last row, though I always used the same models for making these pp_check plots.

This is the full m2 model, family = hurde_lognormal. Received treatment hours is the y variable, thereâ€™s a lot of zero inflation.

fit = brm(bf(received_treatment_hours ~ predictor1 + â€¦ + predictor9 + (1 | region), hu ~ predictor1 + â€¦ + predictor9 + (1 | region), data = data, family = hurdle_lognormal(), cores = 3, chains = 3)

Interestingly, the predictions I am interested in, are consistent between the models and also similar to the splitted analysis (lognormal model and binomial model). Whatâ€™s also interesting, that hierarchical structure did not give high se_diff values while comparing lognormal/binomial models in splitted analysis, but they became significant with hurdle models.

Do you mean â€śthey (se_diff) became high with hurdle modelsâ€ť? Are you certain that the observations are in the same order for both models? That one plot raises doubt, and different orders could explain the high se_diff.

Thank you for your comments! I run m2 again and now se_diff values are small again, and there are no difference between the models. Thus, this seems solved now.

``````Model comparisons:
elpd_diff se_diff
m1                                0.0       0.0
m2                               -0.4       1.7``````
2 Likes