Model comparison in latent variable models

I am running some latent variable models in brms following the examples from here and here. The models I get from these all seem fine, and converge appropriately.

I am running 3 different latent variable models, and I now want to compare them to see which best represents the structure of the data. However, when I try to calculate loo for a model I get the following error:

Error in while (t < nrow(acov) - 5 && !is.nan(rho_hat_even + rho_hat_odd) &&  : 
  missing value where TRUE/FALSE needed
In addition: Warning message:
NAs were found in the log-likelihood. Possibly this is because some of your responses contain NAs. If you use 'mi' terms, try setting 'resp' to those response variables without missing values. Alternatively, use 'newdata' to predict only complete cases.

I get a similar message when computing WAIC.

I am using mi terms to model the latent variable, but it is not clear to me where the resp should go?
I had read somewhere on this forum (although I can no longer find it sorry) that computing loo with mi() model terms was inappropriate. So perhaps I should be using alternative model comparison statistics.
Are there alternative model comparison statistics I could use here?


  • Operating System: Mac OSX 10.15.7
  • brms Version: 2.14.4

This is hard to say without seeing the model specification, but based on the links you provided, I’m assuming that your latent variable in the dataframe is just a column vector of NA values. LOO and other information criteria aren’t going to be estimated when there isn’t observed data. These criteria are related to prediction accuracy, but if the response variable is not actually observed, then there’s no way to see how leaving out an observation affects the model’s prediction since we don’t know the observed value to begin with.

The recommendation from the error message is to specify the response variable as something that has observed data, but that doesn’t seem like what you’re interested in (i.e., you’re interested in the model’s ability to tell you about the unobserved latent variable). Depending on your goal here, you may try a different parameterization of your model. For example, IRT models and latent growth curve models can be estimated as generalized (non-)linear mixed models, which is brms’ wheelhouse. Alternatively, if you just want to do Bayesian SEM, then you may check out alternative packages like blavaan that are built specifically for that purpose, or you could just specify the model in Stan like discussed here.

There’s potentially something you could do with the posteriors to compare models. I’m sure that RMSEA-based fit indices could be computed to compare models. My personal recommendation would be to examine and compare the posterior predictive checks. The best model would be the one that captures the data generation process best. It sounds to me like the goal right now is to figure out the best fitting model, which is probably easier to do in another SEM-dedicated package.

1 Like