K-fold CV with observation-level random effects

mhollanders · May 23, 2024, 11:22pm

Hi,

Pardon if this has been brought up before but I couldn’t find something exactly like my query.

I’m interested in using loo with a model that includes random site-level effects, which here is the lowest level where the likelihood factorises (shoutout to @jsocolar for clearly explaining this in the {flocker} paper). As expected, there are a bunch of high Pareto k values for the models with these site-level effects. The integration feels a bit daunting because there are multiple site-level effects in this dynamic occupancy model, so I thought I’d have a go at K-fold cross validation. However, in the vignette, it appears that in the generated quantities you’re computing the log_lik for each observation of the holdouts, but they wouldn’t have a corresponding random individual effect. So do you still have to integrate them out here? Or do you predict a new random effect for each observation with a _rng() call or something?

Hope my question is clear enough.

Thanks,

Matt

avehtari · May 24, 2024, 7:05am

Yes. You assume that you are predicting for a site from which you did not have observations, and thus you draw the site-specific parameters (aka “random effects”) from the prior using _rng(), which corresponds to integrating over them.

mhollanders · May 24, 2024, 7:10am

Makes sense, thanks!

mhollanders · May 29, 2024, 4:13am

Hey Aki,

Just a quick follow-up. In section 5 of this roaches example, the observation-level random effects are integrated out. Would it be analogous to use _rng() functions in the log_lik computations of each observations, instead of the actual posterior draws of each observation?

For instance, imagine eta[i] is the random effect for each observation i. The log_lik might look something like this:

log_lik[i] = normal_lpdf(alpha + eta_rep[i] + beta * x[i], sigma);

With what I’m saying by using the _rng() functions in the log_lik calculations, it would look like:

eta_rep[i] = normal_rng(0, tau);
log_lik[i] = normal_lpdf(alpha + eta_rep[i] + beta * x[i], sigma);

Is that correct? Thanks.

avehtari · May 29, 2024, 9:15am

Analogous, but you would be replacing quadrature integration with user defined error threshold with just one random draw from the conditional distribution of eta[i] so the performance in PSIS-LOO would be bad. There is a paper proposing sampling many draws, but that is less efficient than the quadrature.

However, if you consider this for K-fold-CV (with running MCMC separately for each fold) the accuracy can be sufficient.

Your code examples are missing the first argument of normal_lpdf() and you are using x[i] which looks like the “training” data and not the left-out data, so it’s not clear if you are still asking about K-fold-CV

mhollanders · May 29, 2024, 10:39pm

Hey Aki,

Sorry for the confusion. I was talking about just computing log_lik[i] for each observation using all of the data, so no K-fold. And sorry about the wrong code, what I meant to write was this:

eta_rep[i] = normal_rng(0, tau);
log_lik[i] = normal_lpdf(y[i] | alpha + eta_rep[i] + beta * x[i], sigma);

where y are all of the observed data and x are accompanying predictor values.

avehtari · May 30, 2024, 6:34am

Yes, this works in theory, but if log_lik is used in PSIS-LOO (or WAIC) this usually fails in practice as one draw estimation of the integral is too noisy

mhollanders · May 30, 2024, 7:47am

Thanks, makes sense. I tried to implement it but was getting similarly bad Pareto k’s so will give K-fold a crack. Thanks!

Topic		Replies	Views
Advice about LOO General	1	422	June 29, 2021
Help with a very simple example of k-fold cross-valiation with loo Modeling loo	2	894	April 5, 2018
Lppd and k-folds CV with multivariate normal and trial-specific parameters Modeling loo , model-comparison	2	398	September 3, 2023
K-fold cv unreliable results General loo	11	1390	August 25, 2017
IS-LOO / K-fold CV vs Bayes factors General loo , model-comparison	2	1442	October 28, 2020

K-fold CV with observation-level random effects

Related topics