In this case, m1 is actually the better model, as, what you say is correct: it outperforms m2 by more than 3 times the
se_diff, if this is what you set as your criterion for calling a model “better”. That the numbers are small does not matter, as the absolute values of
elpd are not meaningful in itself and similarly, we cannot judge by merely looking at
elpd_diff to see whether it is “small” or “large”. This is why we need to consider
se_diff, just as you already suggested.
EDIT: for people checking up on this in the future, @avehtari actually pointed out (below) that this difference can in fact be considered insignificant due to small numbers and that it is possible to interpret absolute numbers. I leave the point above to not break the conversation-flow but want to point out that what I was saying was not 100% correct.
yes this is correct.
Yes your interpretation is correct again. Did you maybe fit different models / data here?
elpd_diff being so small while
se_diff being so large could indicate that the two models fit the data equally well and that the absolute values of
elpd are large, and so are their standard-deviations and hence is
se_diff. Again, this is not by itself meaningful, as the absolute value of
elpd does not tell us much without comparing it to something.
If you haven’t done so already, consider having a look at the loo glossary for more information about