Using `loo` for clustered data

avramaral · September 11, 2022, 7:28am

This is a more conceptual than a practical question on how to validate a fitted model. In particular how to proceed in the presence of clustered data.

For instance, let y_{ij}, such that i = 1, \cdots, r and j = 1, \cdots, n_i, corresponds to an observation for the j\text{-th} individual belonging to region i. In that case, I would like to fit two models

\begin{align} \mathcal{M}_{\text{A}}: y_{ij} &= \mathbf{x}_{ij}\boldsymbol{\beta} + u_i + \epsilon_{ij} \\ \mathcal{M}_{\text{B}}: y_{ij} &= \mathbf{x}_{ij}\boldsymbol{\beta} + \epsilon_{ij} \end{align}

such that \mathbf{u} \sim G represent the spatial random effects and \epsilon_{ij} \overset{\text{i.i.d.}}{\sim} \text{N}(0, \sigma^2_{ij}).

Assume I fitted the models using, for example, Stan, and want to compare them based on the “leave-one-out cross-validation” procedure. To do so, I used the loo package and compute all the required quantities as in this vignette.

Then, I can analyze the results based on the loo_compare(loo_A, loo_B) output.

Finally, my question is, since I am dealing with a clustered data set (defined by regions i), does still make sense to use the (approximated) “leave-one-out” validation procedure? Instead, should I treat each cluster as one observation (and re-fit the model r times)?

avehtari · September 12, 2022, 4:09pm

See CV-FAQ “Can cross-validation be used for hierarchical / multilevel models?”, and the references and case studies listed in the answer. Let me know, if the answer there is not clear, and I can try to improve it.

Topic		Replies	Views
Regression model with indicators for groups of size 1: what does loo() approximate? Modeling loo	5	595	February 13, 2020
Help with a very simple example of k-fold cross-valiation with loo Modeling loo	2	894	April 5, 2018
Loo/cross-validation with correlated data General loo , arviz	6	2577	July 19, 2021
Implementation of model stacking using exact leave-one-out cross-validation General techniques , loo , validation , cross-validation	5	311	April 3, 2024
To use or not to use the assignement to a hierarchical group as a feature for exact LOO and LOO-PSIS Modeling loo	6	910	October 14, 2020

Using `loo` for clustered data

Related topics