Loocv for comparing hierarchical models

LisaMuehlheim · November 8, 2020, 7:30pm

I am trying to make sense of why brms/stan uses LOOCV to compare nested Bayesian hierarchical models. A simple example - suppose you calculate LOOCV for both models:

m1: outcome ~ predictor + (1|subject)

m2: outcome ~ (1 | subject)

Then, if there is very little within-subject variation, each “left out datapoint” will be easily predictable using the information from the other datapoints of that subject. E.g. suppose the data are

Subj Predictor Outcomes
S1 | 0 | 10.5, 10.6, 10.3, 10.1, 10.9
S2 | 0 | 21.2, 21.4, 21.5, 21.9, 21,3
S3 | 0 | 10.2, 10.3, 10.5, 10.6, 10.9
S4 | 0 | 12.3, 12.5, 12.9, 13.0, 13.1
S5 | 0 | 22.1, 22.1, 22.1, 23.1, 23.3
S6 | 0 | 14.2, 14.4, 14.5, 14.9, 14,3
S7 | 0 | 10.5, 20.6, 20.3, 20.1, 20.9
S8 | 0 | 21.2, 21.4, 21.5, 21.9, 21,3
S9 | 0 | 20.2, 20.3, 20.5, 20.6, 20.9
S10 | 0 | 22.3, 22.5, 22.9, 23.0, 23.1
S11 | 1 | 90.5, 90.6, 90.3, 90.1, 90.9
S12 | 1 | 91.2, 91.4, 91.5, 91.9, 91,3
S13 | 1 | 90.2, 90.3, 90.5, 90.6, 90.9
S14 | 1 | 92.3, 92.5, 92.9, 93.0, 93.1
S15 | 1 | 92.1, 92.1, 92.1, 93.1, 93.3
S16 | 1 | 94.2, 94.4, 94.5, 94.9, 94,3
S17 | 1 | 90.5, 90.6, 90.3, 90.1, 90.9
S18 | 1 | 91.2, 91.4, 91.5, 91.9, 91,3
S19 | 1 | 90.2, 90.3, 90.5, 90.6, 90.9
S20 | 1 | 92.3, 92.5, 92.9, 93.0, 93.1

In this data:
All scores for predictor=0 are between 10 and 23.9
All scores for predictor=1 are between 90 and 95.
=> the predictor plays an important role here

In addition, within each subject there is relatively little variation.

What I don’t understand is: If you run loocv on m2, then predicting each left-out observation can make use of the information from each cluster. E.g. if S3,Outcome=10.5 is left out then the model can still predict the value using the other 4 values of outcome, (10.2, 10.3, 10.6, 10.9) can be used to make a highly accurate prediction. The same goes for all other observations.

Now m1 has the additional predictor variable – but in the example above, this predictor variable doesn’t provide much additional information. There is more variation between subjects at each value of the predictor than within each subject.

So presumably loocv will report little additional benefit of m1 over m2 when predicting new data. But that is obviously wrong.

Obviously this is a contrived example – but I don’t get how the loocv procedure can say anything meaningful about comparisons between hierarchical models, especially with such nested models.

I’m sorry that this question is obviously missing something quite fundamental, but I haven’t found any solution.

Does loocv do anything to take this into account (e.g. “leave-one-cluster-out cv” or provide a diagnostic about this?)

Thanks

avehtari · November 9, 2020, 2:12pm

I think it is wrong to say that brms/stan uses LOOCV. brms/rstanarm/loo have some support to make fast LOOCV, but it is the user who decides which kind of cross-validation is used for hierarchical models.

LOO-CV is leave-one-out, but you can use instead leave-one-cluster-out if you want. See links for videos, case studies, and papers in CV-FAQ 8.

LisaMuehlheim · November 9, 2020, 11:36pm

Thankyou!

Topic		Replies	Views
Loo for hierarchical model with trial-by-trial dependencies Modeling loo	8	93	April 2, 2025
Model comparison, Log-Likelihood and WAIC Modeling	1	1090	December 1, 2019
WAIC and loo in multivariate models with different response variables brms loo	11	1976	May 9, 2018
Using loo to compare nested hierarchical models with missing observations Modeling loo	5	818	December 27, 2019
LOO and bayes_R2 (seem to) contradict posterior predictive check Modeling loo , posterior-predictive , brms	14	1263	October 14, 2022

Loocv for comparing hierarchical models

Related topics