Approximating leave one cluster out cross validation

cfhammill · June 7, 2018, 9:26pm

Hi Stan folks, I coworker of mine recently reminded me of leave-one-cluster-out (LOcO) cross validation for comparing hierarchical models. Sophia Rabe-Hesketh and Dan Furr gave an interesting talk on this at StanCon Asilomar.

My recollection of their technique was that they used quadrature to produce the marginal likelihoods needed for computing the expected log predictive density.

My question is, can we approximate LOcO from the standard log-likelihood of (say) rstanarm? For each data point the importance weight would be the inverse of the joint likelihood of each point in that cluster, then pareto smoothing can be applied to the weights as in LOO. Or is this only possible for a single held out point?

Thanks,

Chris

bgoodri · June 7, 2018, 10:01pm

You need quadrature except in the case where the outcome is Gaussian. You need to be able to write a log-likelihood function without cluster-specific parameters.

cfhammill · June 8, 2018, 1:51pm

Ahh I think that makes sense. How do you do it in the gaussian case?

bgoodri · June 8, 2018, 2:50pm

It depends on how complicated the cluster-specific part of the model is, but the equations / code are probably in the StanCon repo

avehtari · June 8, 2018, 7:04pm

In theory we could approximate LOcO with IS, but removing all observations related to the local parameters is likely to change the posterior too much unless hierarchical prior is very strong and data is weak.

And this is then the computationally more involved but more stable approach,

cfhammill · June 12, 2018, 3:34pm

Thanks Ben and Aki,

So if I’m understanding correctly, changing the posterior too much will result in noisy importance weights that won’t be well approximated by the generalized pareto distribution.

I’ll try to wrap my head around the quadrature approach in this case then.

avehtari · June 12, 2018, 7:36pm

More accurate description would be: Changing the posterior too much will result in larger variation in raw importance ratios, If that variation is that large that the distribution of the ratios has infinite mean and variance, then it doesn’t matter even if we can approximate that distribution well with the generalized Pareto distribution as the mean of that approximation will be infinite, too.

Topic		Replies	Views
Extract log-likelihood function from rstanarm model Algorithms rstan , loo	3	1460	May 3, 2020
Pointwise loo likelihood for binary classification General loo	3	685	May 1, 2019
Loo for hierarchical model with trial-by-trial dependencies Modeling loo	8	108	April 2, 2025
Inquiry on the article: Efficient leave-one-out cross-validation for Bayesian non-factorized normal and Student-t models Modeling loo	2	62	January 20, 2025
Regression model with indicators for groups of size 1: what does loo() approximate? Modeling loo	5	595	February 13, 2020

Approximating leave one cluster out cross validation

Related topics