I have some candidate models and a large dataset. I want to compare these models using LOO-CV.
The dataset as ~20 natural subdivisions—different subjects in the same (very long) psychological experiment. Ideally, I would fit each candidate model as a multilevel/hierarchical model, then estimate LOOICs in the usual way. Unfortunately, the dataset is too big to feasibly fit any model to all data at once.
My current plan is to fit separate candidate models to each subject, which isn’t too taxing computationally, estimate LOOIC per subject model, then combine LOOICs across subject models of the same kind to find out what the supermodel’s LOOIC would be.
My questions are:
-
I feel like this approach is equivalent to fitting a single candidate model to all data but without partial pooling . With so much data per submodel, I know partial pooling doesn’t make much of a difference to the submodel parameter estimates. Besides losing partial pooling, is there anything else disadvantageous about this approach?
-
If I have the estimates LOOIC, number of effective parameters, and se of LOOIC per submodel, how to I combine them to get an estimate of the supermodel’s LOOIC?