This is a more conceptual than a practical question on how to validate a fitted model. In particular how to proceed in the presence of clustered data.
For instance, let y_{ij}, such that i = 1, \cdots, r and j = 1, \cdots, n_i, corresponds to an observation for the j\text{-th} individual belonging to region i. In that case, I would like to fit two models
such that \mathbf{u} \sim G represent the spatial random effects and \epsilon_{ij} \overset{\text{i.i.d.}}{\sim} \text{N}(0, \sigma^2_{ij}).
Assume I fitted the models using, for example, Stan, and want to compare them based on the “leave-one-out cross-validation” procedure. To do so, I used the loo
package and compute all the required quantities as in this vignette.
Then, I can analyze the results based on the loo_compare(loo_A, loo_B)
output.
Finally, my question is, since I am dealing with a clustered data set (defined by regions i), does still make sense to use the (approximated) “leave-one-out” validation procedure? Instead, should I treat each cluster as one observation (and re-fit the model r times)?