Multilevel SEM with ordinal data at each level and latent moderated predictors

simonbrauer · May 25, 2023, 1:05pm

Pinging a few people who I know work with SEM @edm @Mauricio_Garnier-Villarre . Maybe they know of some relevant tutorials.

Stan is very flexible, so I would guess that it could estimate such a model. However, it might take a lot of tuning of priors/constraints to get it to converge. With all that in mind, here are some thoughts that will hopefully be helpful (my apologies if I retread basic SEM stuff that you’re already familiar with).

Learning Stan: blavaan is a Bayesian SEM package that uses lavaan syntax and estimates the model in Stan (or JAGS) while brms uses R’s formula notation for models generally (not SEMs). I doubt either of those can get everything you’re looking for, but they might get you most of the way there. But this means you’ll probably need to learn base Stan if you want everything. The user guide is a great resource to learn the language. I ran through the early sections on basic models and picked up most of what I needed to get started. Also check out this thread.

Conditional vs Marginal models: SEMs can be expressed in two different ways. The conditional approach treats values of the latent variables as model parameters. For example, if you have 100 observations and 2 latent variables, then this would add 200 parameters to the model. In Kevin McKee’s post that you linked to, he uses this approach where matrix[N,D] z; indicates that he is treating the latent variables as parameters which he then uses to predict the outcomes (y[n, d, q] ~ ordered_logistic( z[n,d] * lambda[q, d], c[q, d]);). Note that the values of y are assumed to be independent of one another conditional on z.

The marginal approach uses parameters describing latent variable distributions (loadings, variances, and covariances) to generate a model-implied covariance matrix which is used to model y without treating estimating the latent variables themselves. In other words, the number of model parameters does not scale with the number of observations. If the model requires 10 parameters with 100 observations, then it will require 10 parameters for 10,000 observations. It can be very efficient (see paper here or post here) but is easiest for multivariate normal variables. To do this in Stan, you would need to specify a multivariate ordered distribution, which is not built-in, and I have not see anyone do yet.

Missing data: Either approach (conditional or marginal) can be used to handle missing data in a FIML-like way (see here). Alternatively, you can always treat missing values as parameters as with multiple imputation and splice them in (see here). In some cases, you may need to use both strategies together.

Building the model: One of the nice things about Stan is that you can keep adding complexity to the model without having to jump to another package. For example, you have to jump from lm to lmer once you add in the multilevel structure. If you do end up using Stan, I’d suggest you break down the problem into discrete steps and build it up that way. Something like

Single-level SEM with continuous indicators, listwise deletion
Single-level SEM with continuous indicators, handle missing data
Single-level SEM with ordinal indicators
Two-level SEM with ordinal indicators (no cross-level interactions)
Two-level SEM with ordinal indicators (with cross-level interactions)
Three-level SEM with ordinal indicators

You might have a more reasonable sequence. For example, you might switch steps 3 and 6 and push off the complexity introduced by the ordinal variables. But the point (that I often neglect myself) is that Stan is a great tool for starting simple and building in complexity along the way. It is much easier to get a simple model working and make it slightly more complex than it is to start with a massive, complex model and make it work (again, guilty as charged). Especially if you’re just learning the language.

Topic		Replies	Views
Help with multi-level SEM in Stan Modeling rstan , techniques , specification	18	423	June 25, 2025
Translation of an SEM with Ordered Categorical Variables from WinBUGS Modeling fitting-issues , specification , performance , gpu , cmdstanr	21	1062	October 16, 2023
Time-series in Stan, I am new to Stan and need hints to develop the model. THANKS Modeling rstan , specification	43	2859	June 12, 2020
Specification of Bayesian SEM models with a data-augmentation approach Modeling rstan , fitting-issues , specification	26	3464	December 9, 2020
Modeling Multivariate Multilevel Model in STAN Modeling	4	1441	June 2, 2020

Multilevel SEM with ordinal data at each level and latent moderated predictors

Related topics