Similar issue to the one here: Using model comparison (loo or waic) after imputation, but I can’t find any discussion that is more recent than 2021, so I’d like to revive the topic.
Using brms
. I have a dataset with about 9000 observations where the outcome variable is a latent variable, so I’m using a calibration dataset to fit a linear model for an indicator variable, then posterior_predict
to generate list of datasets with imputed values, and brm_multiple
to fit models on those imputed datasets. Similar to what is described here: Handle Missing Values with brms
I’m hoping to compare a number of models using k-fold cross validation, as this seems like the best method for a large dataset like this. When I do this (or other post-fitting diagnostics like loo
, waic
, or bayes_R2
, I get the message
Warning: Using only the first imputed data set. Please interpret the results with caution until a more principled approach has been implemented.
So my question - is there any more principled approach? How would you compare model structures in a dataset like this one?