Interpreting elpd_diff - loo package

OK thanks! I’ve never really written any code at all (other than specifying models for off-the-shelf R packages), but I will try my best.

What I was asking above (and sorry for being unclear again!) is how bad would it be to compare the two models using non-pointwise WAIC (or DIC); i.e., the value that comes with the model summary:

Log-likelihood at expected values: -7114.01
Deviance: 14228.02
DIC: 14823.37
Effective number of parameters (pD): 297.68
WAIC (SE): 14828.32 (170.9)
pWAIC: 292.09

I understand it’s far less accurate than the pointwise version, but is it “good enough” for comparing between two models which should be very different?

That value is the sum of pointwise values. There is no non-pointwise version of DIC/WAIC (in theory there could be, but they would fail so likely that just forget that).

Right - so can’t I just compare the two models on the sum of the pointwise values

WAIC=(as.numeric(WAIC(Both_M)))-(as.numeric(WAIC(Both_NO_CA_M)))
WAIC
[1] -6.837828

and interpret this like an AIC value? I know it’s not optimal - but is it a reasonably good approximation, or totally wrongheaded?

Totally wrongheaded!

I dislike all information criterion as they are so easily hiding the original assumptions Akaike made, as this discussion also illustrates.

OK thanks!

Update: this is on its way with the new loo and rstanarm packages we’re releasing next week (both already on GitHub). There will be an example at help("loo", package = "rstanarm") when it comes out.

I know this is a very old post, but I am stuck on this very problem.
Apologies if this is formulated in a really unwieldy way. I have two (related) questions

Does this effectively mean that using LOO-CV is not well suited for comparing two models that differ minimally (e.g. with vs. without a predictor of a main effect) but that have multiple observations within clusters ?

Related to this - I have models:

m1: reactiontime ~ condition + (1|Participant)
m2: reactiontime ~ 1 + (1| Participant)

Condition is a factor with two levels: Baseline and Distraction

The theory predicts that participants in the Distraction condition will react slower.

In order to investigate this, can I just “use [m1] and look at the marginal posterior of the effect” , e.g. in a psycholinguistics journal? Or do I “need” to do LOO-CV comparison?

1 Like

You can also start a new thread and refer to an old post.

It is well suited if you are interested in the difference in predictive performance and leave-one-out approximates your prediction task well. You may need a different structure in cross-validation if you are, for example, interested in predicting jointly for a future data arriving in clusters.

I don’t have any information what psycholinguistics journals expect. If you are interested in the magnitude of the effect and not in how well either model predicts the reactiontime then looking at the posterior is sensible. As you have just one unknown condition parameter, looking at the marginal can be informative as the posterior dependency with participant effects is likely to be small.

2 Likes