This is a more conceptual than a practical question on how to validate a fitted model. In particular how to proceed in the presence of clustered data.

For instance, let y_{ij}, such that i = 1, \cdots, r and j = 1, \cdots, n_i, corresponds to an observation for the j\text{-th} individual belonging to region i. In that case, I would like to fit two models

such that \mathbf{u} \sim G represent the spatial random effects and \epsilon_{ij} \overset{\text{i.i.d.}}{\sim} \text{N}(0, \sigma^2_{ij}).

Assume I fitted the models using, for example, Stan, and want to compare them based on the “leave-one-out cross-validation” procedure. To do so, I used the `loo`

package and compute all the required quantities as in this vignette.

Then, I can analyze the results based on the `loo_compare(loo_A, loo_B)`

output.

Finally, my question is, since I am dealing with a clustered data set (defined by regions i), does still make sense to use the (approximated) “leave-one-out” validation procedure? Instead, should I treat each cluster as one observation (and re-fit the model r times)?