Translating Lavaan model to Stan - Low E-BFMI


I would like to translate a relatively simple Lavaan model to Stan. I’m aware of the blavaan package, but I plan to augment this model in the future with some specifications that are not available in blavaan, so it’s important that I learn how to code this model directly in Stan.

This is the model depicted in a path diagram:

Screenshot 2024-03-23 at 7.46.33 PM

Based on previous studies, I constrain the variances of the error terms E_1, E_2, and E_3 to be equal for identifiability.

Here’s the Lavaan model, which I run with the lavaan() and blavaan() functions’ default specifications:

model.lavaan <- '
  cs08 =~ 1 * cs08a247
  cs10 =~ 1 * cs10c247
  cs11 =~ 1 * cs11d247

  cs10 ~ beta21 * cs08
  cs11 ~ beta32 * cs10

  cs08a247 ~~ vare * cs08a247
  cs10c247 ~~ vare * cs10c247
  cs11d247 ~~ vare * cs11d247

  cs08 ~~ resvart08 * cs08
  cs10 ~~ resvart10 * cs10
  cs11 ~~ resvart11 * cs11

And here’s my current Stan code:

data {
  int<lower=1> N; // Number of observations
  vector[N] cs08a247; // Observed variable for cs08
  vector[N] cs10c247; // Observed variable for cs10
  vector[N] cs11d247; // Observed variable for cs11

parameters {
  real beta21; // Effect of cs08 on cs10
  real beta32; // Effect of cs10 on cs11
  real<lower=0> sde; // Error variance for observed variables
  real<lower=0> ressd08; // Residual variance of cs08
  real<lower=0> ressd10; // Residual variance of cs10
  real<lower=0> ressd11; // Residual variance of cs11
  vector[N] cs08_std; // Standard normal for cs08 latent variable
  vector[N] cs10_std; // Standard normal for cs10 latent variable
  vector[N] cs11_std; // Standard normal for cs11 latent variable

transformed parameters {
  vector[N] cs08 = ressd08 * cs08_std; // Apply non-centered parametrization
  vector[N] cs10 = beta21 * cs08 + ressd10 * cs10_std; // Apply non-centered parametrization
  vector[N] cs11 = beta32 * cs10 + ressd11 * cs11_std; // Apply non-centered parametrization

model {
  // Priors for beta coefficients
  beta21 ~ normal(0, 10);
  beta32 ~ normal(0, 10); 
  // Prior for variance parameters
  sde ~ gamma(1,.5);
  ressd08 ~ gamma(1,.5);
  ressd10 ~ gamma(1,.5);
  ressd11 ~ gamma(1,.5);
  // Priors for standard normal parameters
  cs08_std ~ normal(0, 1);
  cs10_std ~ normal(0, 1);
  cs11_std ~ normal(0, 1);

  // Measurement model
  cs08a247 ~ normal(cs08, sde);
  cs10c247 ~ normal(cs10, sde);
  cs11d247 ~ normal(cs11, sde);

My estimates are similar to those yielded by lavaan() and blavaan(), but I’m getting a low E-BFMI.

Could someone please assist me? I appreciate your time and help.

I think the main difference between your Stan code and the blavaan code is that you are sampling the latent variables as parameters and blavaan is not. It is possible to write this model as a multivariate normal of all three observed variables, with the latent variables integrated out of the likelihood. That really improves efficiency because you have many fewer parameters to sample and to autodiff. And it is possible to sample the latent variables in generated quantities if you want them.

Some further details appear in this discourse thread and in this JSS paper. Also, see this post that illustrates how the marginalization works for a specific model.

If you plan to do something outside of the usual SEM framework, though, the marginalization of latent variables could get tricky. It may depend on whether you plan to use a multivariate normal likelihood.


Thank you very much, @edm. Integrating the latent variables out of the likelihood removed the warning messages.