Multilevel SEM with ordinal data at each level and latent moderated predictors

Pinging a few people who I know work with SEM @edm @Mauricio_Garnier-Villarre . Maybe they know of some relevant tutorials.

Stan is very flexible, so I would guess that it could estimate such a model. However, it might take a lot of tuning of priors/constraints to get it to converge. With all that in mind, here are some thoughts that will hopefully be helpful (my apologies if I retread basic SEM stuff that you’re already familiar with).

Learning Stan: blavaan is a Bayesian SEM package that uses lavaan syntax and estimates the model in Stan (or JAGS) while brms uses R’s formula notation for models generally (not SEMs). I doubt either of those can get everything you’re looking for, but they might get you most of the way there. But this means you’ll probably need to learn base Stan if you want everything. The user guide is a great resource to learn the language. I ran through the early sections on basic models and picked up most of what I needed to get started. Also check out this thread.

Conditional vs Marginal models: SEMs can be expressed in two different ways. The conditional approach treats values of the latent variables as model parameters. For example, if you have 100 observations and 2 latent variables, then this would add 200 parameters to the model. In Kevin McKee’s post that you linked to, he uses this approach where matrix[N,D] z; indicates that he is treating the latent variables as parameters which he then uses to predict the outcomes (y[n, d, q] ~ ordered_logistic( z[n,d] * lambda[q, d], c[q, d]);). Note that the values of y are assumed to be independent of one another conditional on z.

The marginal approach uses parameters describing latent variable distributions (loadings, variances, and covariances) to generate a model-implied covariance matrix which is used to model y without treating estimating the latent variables themselves. In other words, the number of model parameters does not scale with the number of observations. If the model requires 10 parameters with 100 observations, then it will require 10 parameters for 10,000 observations. It can be very efficient (see paper here or post here) but is easiest for multivariate normal variables. To do this in Stan, you would need to specify a multivariate ordered distribution, which is not built-in, and I have not see anyone do yet.

Missing data: Either approach (conditional or marginal) can be used to handle missing data in a FIML-like way (see here). Alternatively, you can always treat missing values as parameters as with multiple imputation and splice them in (see here). In some cases, you may need to use both strategies together.

Building the model: One of the nice things about Stan is that you can keep adding complexity to the model without having to jump to another package. For example, you have to jump from lm to lmer once you add in the multilevel structure. If you do end up using Stan, I’d suggest you break down the problem into discrete steps and build it up that way. Something like

  1. Single-level SEM with continuous indicators, listwise deletion
  2. Single-level SEM with continuous indicators, handle missing data
  3. Single-level SEM with ordinal indicators
  4. Two-level SEM with ordinal indicators (no cross-level interactions)
  5. Two-level SEM with ordinal indicators (with cross-level interactions)
  6. Three-level SEM with ordinal indicators

You might have a more reasonable sequence. For example, you might switch steps 3 and 6 and push off the complexity introduced by the ordinal variables. But the point (that I often neglect myself) is that Stan is a great tool for starting simple and building in complexity along the way. It is much easier to get a simple model working and make it slightly more complex than it is to start with a massive, complex model and make it work (again, guilty as charged). Especially if you’re just learning the language.

2 Likes