Help with multi-level SEM in Stan

Dear Stan community,

I want to expand some multivariate Stan models into structural equation models (SEM). I have found some resources how to fit SEM’s in Stan, but I want to fit a multi-level SEM where several random effects partitioned from the response variables contribute to the latent variable (eta) , e.g:

Y1_it = B_0 + i_0 + (B_1 + i_1) .* x + i_2 + e_it

Where i represents individual-level deviations from the mean (B_0) or the population slope (B_1).

According to the SEM literature:

i = (B_0i) + eta_i * lambda + e_i

where eta ~N(0,1)

Since i represents random deviations from the mean(s) I put this in parentheses and did not fit this in my model. I have simulated some data with one latent variable caused by my six traits of interest (see attached R.script) to assess whether my Stan model (see pasted below) estimates the simulated lambda’s correctly. I have been trying to get this model to work for some time now, but can’t seem to get the model to estimate the lambda’s correctly. The lambda’s I simulated are all 0.5 but the model consistently estimates them as 0.2 I checked this with (b)lavaan to confirm whether the underlying latent structure of i is simulated correctly, and this does check out (see script). Thus there is something wrongly specified in my Stan model, therefore I have the following questions:

  1. What is going wrong with the structural part of the model?

    • I have tried fitting the model with an intercept per i trait, and without the error part (e_i) as well in the structural part, but that did not improve the estimation in both situations.
    • Does anyone have/knows of some example code for a similar model setup maybe? That would help me understand how the model is supposed to work a lot better. I am curious how this would be fitted with multiple latent variables and how to set or constrain the correlations between the latent variables. I assume this works similar to normal correlation structures in Stan.
  2. I read in Merkle et al., 2021 JSS that fitting a marginal likelihood would improve estimation (efficiency greatly), but I was not able to figure out how to fit this in Stan exactly, would this improve the model fit?

  3. Is there a way to estimate the correlation structure of i when the structural model is fit? In the base model (see script) I estimated the correlation matrix of i without the latent variable structure, but wonder how this changes when the structural part is fitted.

  4. This is probably a step for later, but I have a variable of interest that I want to estimate a loading for, that is not directly measured by the measured model. It is instead derived of estimated parts in the measurement model like phi = B_1 * x_i + i_2. In the base model I can use the linearity of (co)variance to estimate the respective correlations with phi. But can I also estimate the loading of a product like phi in the structural part of i?

Thanks a lot in advance!
Corné

data {
  int<lower=1> No;  // total number of observations
  int<lower=1> Ni;  // number of individuals
  array[No] int individual; // individual identity for each observation
  array[No] int<lower=1> opponent1; // opponent 1 ID
  array[No] int<lower=1> opponent2; // opponent 2 ID
  
  // Predictors
  vector[No] xi;     // focal observed x trait (with measurement error)
  vector[No] xjk;    // average social partner x trait
  
  // Response variables
  array[No] vector[2] zs; // z1 and z2
}

parameters {
  // Fixed effects
  vector[3] mu;           // intercepts: z1, z2, x
  vector[2] psi;          // plasticity slopes for z1 and z2
  
  // Latent variable parameters
  vector[Ni] eta;             // latent factor for each individual
  vector[7] lambda;           // factor loadings (one per trait)
  vector<lower=0>[7] theta;   // unique/residual variances for each trait
  matrix[Ni, 7] epsilon;      // individual-specific unique trait deviations
  
  // Residuals
  cholesky_factor_corr[2] LR;   // residual correlation between z1 and z2
  vector<lower=0>[2] sd_R;      // residual SDs for z1 and z2
  real<lower=0> sigma_ex;       // residual SD for xi
}

transformed parameters {
  // Structural equation
  matrix[Ni, 7] i;  // individual-level trait values from latent + residuals
        i[,1] = lambda[1] * eta[1] + epsilon[,1];
        i[,2] = lambda[2] * eta[1] + epsilon[,2];
        i[,3] = lambda[3] * eta[1] + epsilon[,3];
        i[,4] = lambda[4] * eta[1] + epsilon[,4];
        i[,5] = lambda[5] * eta[1] + epsilon[,5];
        i[,6] = lambda[6] * eta[1] + epsilon[,6];
        i[,7] = lambda[7] * eta[1] + epsilon[,7];

model {
  // Expected values
  vector[No] e_z1;
  vector[No] e_z2;
  vector[No] e_x;
  
  // Measurement models

  // Equation for z1 
  e_z1 = mu[1] + i[individual, 1] + (psi[1] + i[individual, 2]) .* xjk
  + i[opponent1, 3] + i[opponent2, 3];
  
  // Equation for z2 
  e_z2 = mu[2] + i[individual, 4] + (psi[2] + i[individual, 5]) .* xjk
  + i[opponent1, 6] + i[opponent2, 6];
  
  // Equation for x
  e_x  = mu[3] + i[individual, 7];
  
  // Priors
  // Fixed effects
  mu ~ normal(0, 1);
  psi ~ normal(0, 1);
  
  // Latent structure
  eta ~ normal(0, 1);         // latent factor
  lambda ~ normal(0, 1);      // factor loadings
  //theta ~ exponential(3);     // unique variances
  
  // Unique/residual part of individual traits
  for (j in 1:7)
    epsilon[, j] ~ normal(0, theta[j]);
  
  // Residuals
  to_vector(sd_R) ~ exponential(3);
  sigma_ex ~ exponential(3);
  LR ~ lkj_corr_cholesky(1.5);
  
  // Multi-
[Stan_SEM_sim.R|attachment](upload://uVW4ddgqKjo5KCzfXDluDcPGREN.R) (15.4 KB)
normal likelihood
  matrix[2, 2] L_sigma = diag_pre_multiply(sd_R, LR);
  array[No] vector[2] mus;
  for (o in 1:No)
      mus[o] = [e_z1[o], e_z2[o]]';
  zs ~ multi_normal_cholesky(mus, L_sigma);
  
  // Measurement error in xi
  xi ~ normal(e_x, sigma_ex);
}

I don’t think your script came through correctly, but I looked at the Stan model. The first thing that stands out is your transformed parameters block, where eta[1] appears on all the lines. I think you would want to use eta instead, without brackets.

I am not sure that you want what is typically called “mulitlevel SEM”. I find these terms confusing because “single level” SEM already allows for multiple response variables nested within person: the psychometrics people say this is a single multivariate response per person, as opposed to multiple univariate responses nested in person. The psychometrics people then reserve “two-level SEM” for a situation where the people are additionally nested within higher units like schools.

I’ve never understood SEM, but I have a similar question to edm. It looks like this should have all of eta in each line, at which point you can write it as a one-liner using an outer prouct.

vector[Ni] eta;             // latent factor for each individual
...
matrix[Ni, 7] i = epsilon + eta * lambda';

I’d check that I got the transposition right.

thanks @edm @Bob_Carpenter for the replies. Indeed it seems something went wrong with attaching the R script for the simulation etc., so I have attached it again here.
Stan_SEM_sim.R (15.5 KB)

My bad for using the wrong terminology for the type of SEM I was referring to, I’m still relatively new to working with SEM’s. What would be an appropriate way to refer to this model?

I’ve edited the model code for eta, which before referred to a matrix to accommodate multiple latent variables. When fitting the stan model I am still getting lambda estimates that don’t recover the simulated values of 0.5, and instead estimate the lambdas as ~0.2. What else can be improved in the model code to get a better estimation of the lambdas?

Those lambda parameters usually have a sign indeterminacy, e.g., -lambda * -eta is the same as lambda * eta. An easy way to check whether this is a problem is to set the prior on lambda to normal(.5, .1) or something like that, and see whether you recover the correct values.

This sign indeterminacy issue comes up relatively often here, for example see Non-convergence of latent variable model - #13 by Mauricio_Garnier-Villarre

When that’s the case, it’s best to identify the model by pinning the sign of one of the parameters, e.g., declaring

parameters {
  real<lower=0> lambda;
  ...

Unless there is a lot of data, we do not expect to recover parameters exactly as posterior means. For example, suppose I generate 5 variates form a standard normal and compute their mean, e.g., here’s 10 runs:

>  for (n in 1:10) { y <- rnorm(10, mean=0, sd=1); print(c(mean(y), sd(y)), digits=2) }
[1] -0.41  0.79
[1] -0.074  1.100
[1] 0.21 0.93
[1] 0.14 0.86
[1] -0.43  0.95
[1] -1.0  1.4
[1] 0.19 0.85
[1] 0.031 1.021
[1] 0.29 1.17
[1] 0.16 0.79

So even though the simulated mean is 0 and standard deviation is 1, the recovered means and standard deviations are all over the place with only 10 draws. In fact, in this case, we know that the mean of 10 standard normal draws is distributed as normal(0, 1 / sqrt(10)). Here’s what it looks like with 1000 simulated data points rather than 10.

>  for (n in 1:10) { y <- rnorm(1000, mean=0, sd=1); print(c(mean(y), sd(y)), digits=2) }
[1] 0.033 1.005
[1] 0.036 1.010
[1] -0.029  1.025
[1] -0.0024  1.0212
[1] 0.0046 1.0166
[1] -0.081  1.021
[1] -0.013  0.995
[1] -0.0096  0.9921
[1] -0.059  1.007
[1] -0.045  1.014

What we do expect is to uncover uncertainty in a calibrated way. That is if we simulate from the model 100 times, we expect binomial(100, 0.5) of the estimated parameters to fall in their 50% posterior intervals.