Pardon if this has been brought up before but I couldn’t find something exactly like my query.
I’m interested in using loo with a model that includes random site-level effects, which here is the lowest level where the likelihood factorises (shoutout to @jsocolar for clearly explaining this in the {flocker} paper). As expected, there are a bunch of high Pareto k values for the models with these site-level effects. The integration feels a bit daunting because there are multiple site-level effects in this dynamic occupancy model, so I thought I’d have a go at K-fold cross validation. However, in the vignette, it appears that in the generated quantities you’re computing the log_lik for each observation of the holdouts, but they wouldn’t have a corresponding random individual effect. So do you still have to integrate them out here? Or do you predict a new random effect for each observation with a _rng() call or something?
Yes. You assume that you are predicting for a site from which you did not have observations, and thus you draw the site-specific parameters (aka “random effects”) from the prior using _rng(), which corresponds to integrating over them.
Just a quick follow-up. In section 5 of this roaches example, the observation-level random effects are integrated out. Would it be analogous to use _rng() functions in the log_lik computations of each observations, instead of the actual posterior draws of each observation?
For instance, imagine eta[i] is the random effect for each observation i. The log_lik might look something like this:
Analogous, but you would be replacing quadrature integration with user defined error threshold with just one random draw from the conditional distribution of eta[i] so the performance in PSIS-LOO would be bad. There is a paper proposing sampling many draws, but that is less efficient than the quadrature.
However, if you consider this for K-fold-CV (with running MCMC separately for each fold) the accuracy can be sufficient.
Your code examples are missing the first argument of normal_lpdf() and you are using x[i] which looks like the “training” data and not the left-out data, so it’s not clear if you are still asking about K-fold-CV
Sorry for the confusion. I was talking about just computing log_lik[i] for each observation using all of the data, so no K-fold. And sorry about the wrong code, what I meant to write was this:
Yes, this works in theory, but if log_lik is used in PSIS-LOO (or WAIC) this usually fails in practice as one draw estimation of the integral is too noisy