In this case, m1 is actually the better model, as, what you say is correct: it outperforms m2 by more than 3 times the se_diff, if this is what you set as your criterion for calling a model â€śbetterâ€ť. That the numbers are small does not matter, as the absolute values of elpd are not meaningful in itself and similarly, we cannot judge by merely looking at elpd_diff to see whether it is â€śsmallâ€ť or â€ślargeâ€ť. This is why we need to consider se_diff, just as you already suggested.

EDIT: for people checking up on this in the future, @avehtari actually pointed out (below) that this difference can in fact be considered insignificant due to small numbers and that it is possible to interpret absolute numbers. I leave the point above to not break the conversation-flow but want to point out that what I was saying was not 100% correct.

yes this is correct.

Yes your interpretation is correct again. Did you maybe fit different models / data here? elpd_diff being so small while se_diff being so large could indicate that the two models fit the data equally well and that the absolute values of elpd are large, and so are their standard-deviations and hence is se_diff. Again, this is not by itself meaningful, as the absolute value of elpd does not tell us much without comparing it to something.

If you havenâ€™t done so already, consider having a look at the loo glossary for more information about elpd.

Sorry I was not very clear in my wording above: I meant â€śdifferent models / dataâ€ť compared to example 1 and 2. Because, for different data or response variables the values for elpd can drastically change which would explain why the se_diff (and elpd?) might be so much bigger here compared to example 1 or 2â€¦
Either way, your interpretations are correct I think :)

Thanks for correcting this, I added an EDIT note to my answer above. Just for my own understanding, because I find this surprising: When would you consider elpd_diff values small/large then, as from my understanding, as I mentioned above, I thought that the absolute value does not matter?

Why y is different in the last row?
The third row looks a bit suspicious although not completely infeasible.
Are you using just Gaussian model? What is y? Counts with some 0â€™s, too?

I donâ€™t know why y is different in the last row, though I always used the same models for making these pp_check plots.

This is the full m2 model, family = hurde_lognormal. Received treatment hours is the y variable, thereâ€™s a lot of zero inflation.

fit = brm(bf(received_treatment_hours ~ predictor1 + â€¦ + predictor9 + (1 | region), hu ~ predictor1 + â€¦ + predictor9 + (1 | region), data = data, family = hurdle_lognormal(), cores = 3, chains = 3)

Interestingly, the predictions I am interested in, are consistent between the models and also similar to the splitted analysis (lognormal model and binomial model). Whatâ€™s also interesting, that hierarchical structure did not give high se_diff values while comparing lognormal/binomial models in splitted analysis, but they became significant with hurdle models.

Do you mean â€śthey (se_diff) became high with hurdle modelsâ€ť? Are you certain that the observations are in the same order for both models? That one plot raises doubt, and different orders could explain the high se_diff.

Thank you for your comments! I run m2 again and now se_diff values are small again, and there are no difference between the models. Thus, this seems solved now.

Model comparisons:
elpd_diff se_diff
m1 0.0 0.0
m2 -0.4 1.7