Hi Stan folks, I coworker of mine recently reminded me of leave-one-cluster-out (LOcO) cross validation for comparing hierarchical models. Sophia Rabe-Hesketh and Dan Furr gave an interesting talk on this at StanCon Asilomar.

My recollection of their technique was that they used quadrature to produce the marginal likelihoods needed for computing the expected log predictive density.

My question is, can we approximate LOcO from the standard log-likelihood of (say) rstanarm? For each data point the importance weight would be the inverse of the joint likelihood of each point in that cluster, then pareto smoothing can be applied to the weights as in LOO. Or is this only possible for a single held out point?

You need quadrature except in the case where the outcome is Gaussian. You need to be able to write a log-likelihood function without cluster-specific parameters.

In theory we could approximate LOcO with IS, but removing all observations related to the local parameters is likely to change the posterior too much unless hierarchical prior is very strong and data is weak.

And this is then the computationally more involved but more stable approach,

So if I’m understanding correctly, changing the posterior too much will result in noisy importance weights that won’t be well approximated by the generalized pareto distribution.

I’ll try to wrap my head around the quadrature approach in this case then.

More accurate description would be: Changing the posterior too much will result in larger variation in raw importance ratios, If that variation is that large that the distribution of the ratios has infinite mean and variance, then it doesn’t matter even if we can approximate that distribution well with the generalized Pareto distribution as the mean of that approximation will be infinite, too.