OK thanks! I’ve never really written any code at all (other than specifying models for off-the-shelf R packages), but I will try my best.
What I was asking above (and sorry for being unclear again!) is how bad would it be to compare the two models using non-pointwise WAIC (or DIC); i.e., the value that comes with the model summary:
Log-likelihood at expected values: -7114.01
Deviance: 14228.02
DIC: 14823.37
Effective number of parameters (pD): 297.68
WAIC (SE): 14828.32 (170.9)
pWAIC: 292.09
I understand it’s far less accurate than the pointwise version, but is it “good enough” for comparing between two models which should be very different?
That value is the sum of pointwise values. There is no non-pointwise version of DIC/WAIC (in theory there could be, but they would fail so likely that just forget that).
Update: this is on its way with the new loo and rstanarm packages we’re releasing next week (both already on GitHub). There will be an example at help("loo", package = "rstanarm") when it comes out.
I know this is a very old post, but I am stuck on this very problem.
Apologies if this is formulated in a really unwieldy way. I have two (related) questions
Does this effectively mean that using LOO-CV is not well suited for comparing two models that differ minimally (e.g. with vs. without a predictor of a main effect) but that have multiple observations within clusters ?
Condition is a factor with two levels: Baseline and Distraction
The theory predicts that participants in the Distraction condition will react slower.
In order to investigate this, can I just “use [m1] and look at the marginal posterior of the effect” , e.g. in a psycholinguistics journal? Or do I “need” to do LOO-CV comparison?
You can also start a new thread and refer to an old post.
It is well suited if you are interested in the difference in predictive performance and leave-one-out approximates your prediction task well. You may need a different structure in cross-validation if you are, for example, interested in predicting jointly for a future data arriving in clusters.
I don’t have any information what psycholinguistics journals expect. If you are interested in the magnitude of the effect and not in how well either model predicts the reactiontime then looking at the posterior is sensible. As you have just one unknown condition parameter, looking at the marginal can be informative as the posterior dependency with participant effects is likely to be small.