Help with multi-level SEM in Stan

Dear Stan community,

I want to expand some multivariate Stan models into structural equation models (SEM). I have found some resources how to fit SEM’s in Stan, but I want to fit a multi-level SEM where several random effects partitioned from the response variables contribute to the latent variable (eta) , e.g:

Y1_it = B_0 + i_0 + (B_1 + i_1) .* x + i_2 + e_it

Where i represents individual-level deviations from the mean (B_0) or the population slope (B_1).

According to the SEM literature:

i = (B_0i) + eta_i * lambda + e_i

where eta ~N(0,1)

Since i represents random deviations from the mean(s) I put this in parentheses and did not fit this in my model. I have simulated some data with one latent variable caused by my six traits of interest (see attached R.script) to assess whether my Stan model (see pasted below) estimates the simulated lambda’s correctly. I have been trying to get this model to work for some time now, but can’t seem to get the model to estimate the lambda’s correctly. The lambda’s I simulated are all 0.5 but the model consistently estimates them as 0.2 I checked this with (b)lavaan to confirm whether the underlying latent structure of i is simulated correctly, and this does check out (see script). Thus there is something wrongly specified in my Stan model, therefore I have the following questions:

  1. What is going wrong with the structural part of the model?

    • I have tried fitting the model with an intercept per i trait, and without the error part (e_i) as well in the structural part, but that did not improve the estimation in both situations.
    • Does anyone have/knows of some example code for a similar model setup maybe? That would help me understand how the model is supposed to work a lot better. I am curious how this would be fitted with multiple latent variables and how to set or constrain the correlations between the latent variables. I assume this works similar to normal correlation structures in Stan.
  2. I read in Merkle et al., 2021 JSS that fitting a marginal likelihood would improve estimation (efficiency greatly), but I was not able to figure out how to fit this in Stan exactly, would this improve the model fit?

  3. Is there a way to estimate the correlation structure of i when the structural model is fit? In the base model (see script) I estimated the correlation matrix of i without the latent variable structure, but wonder how this changes when the structural part is fitted.

  4. This is probably a step for later, but I have a variable of interest that I want to estimate a loading for, that is not directly measured by the measured model. It is instead derived of estimated parts in the measurement model like phi = B_1 * x_i + i_2. In the base model I can use the linearity of (co)variance to estimate the respective correlations with phi. But can I also estimate the loading of a product like phi in the structural part of i?

Thanks a lot in advance!
Corné

data {
  int<lower=1> No;  // total number of observations
  int<lower=1> Ni;  // number of individuals
  array[No] int individual; // individual identity for each observation
  array[No] int<lower=1> opponent1; // opponent 1 ID
  array[No] int<lower=1> opponent2; // opponent 2 ID
  
  // Predictors
  vector[No] xi;     // focal observed x trait (with measurement error)
  vector[No] xjk;    // average social partner x trait
  
  // Response variables
  array[No] vector[2] zs; // z1 and z2
}

parameters {
  // Fixed effects
  vector[3] mu;           // intercepts: z1, z2, x
  vector[2] psi;          // plasticity slopes for z1 and z2
  
  // Latent variable parameters
  vector[Ni] eta;             // latent factor for each individual
  vector[7] lambda;           // factor loadings (one per trait)
  vector<lower=0>[7] theta;   // unique/residual variances for each trait
  matrix[Ni, 7] epsilon;      // individual-specific unique trait deviations
  
  // Residuals
  cholesky_factor_corr[2] LR;   // residual correlation between z1 and z2
  vector<lower=0>[2] sd_R;      // residual SDs for z1 and z2
  real<lower=0> sigma_ex;       // residual SD for xi
}

transformed parameters {
  // Structural equation
  matrix[Ni, 7] i;  // individual-level trait values from latent + residuals
        i[,1] = lambda[1] * eta[1] + epsilon[,1];
        i[,2] = lambda[2] * eta[1] + epsilon[,2];
        i[,3] = lambda[3] * eta[1] + epsilon[,3];
        i[,4] = lambda[4] * eta[1] + epsilon[,4];
        i[,5] = lambda[5] * eta[1] + epsilon[,5];
        i[,6] = lambda[6] * eta[1] + epsilon[,6];
        i[,7] = lambda[7] * eta[1] + epsilon[,7];

model {
  // Expected values
  vector[No] e_z1;
  vector[No] e_z2;
  vector[No] e_x;
  
  // Measurement models

  // Equation for z1 
  e_z1 = mu[1] + i[individual, 1] + (psi[1] + i[individual, 2]) .* xjk
  + i[opponent1, 3] + i[opponent2, 3];
  
  // Equation for z2 
  e_z2 = mu[2] + i[individual, 4] + (psi[2] + i[individual, 5]) .* xjk
  + i[opponent1, 6] + i[opponent2, 6];
  
  // Equation for x
  e_x  = mu[3] + i[individual, 7];
  
  // Priors
  // Fixed effects
  mu ~ normal(0, 1);
  psi ~ normal(0, 1);
  
  // Latent structure
  eta ~ normal(0, 1);         // latent factor
  lambda ~ normal(0, 1);      // factor loadings
  //theta ~ exponential(3);     // unique variances
  
  // Unique/residual part of individual traits
  for (j in 1:7)
    epsilon[, j] ~ normal(0, theta[j]);
  
  // Residuals
  to_vector(sd_R) ~ exponential(3);
  sigma_ex ~ exponential(3);
  LR ~ lkj_corr_cholesky(1.5);
  
  // Multi-
[Stan_SEM_sim.R|attachment](upload://uVW4ddgqKjo5KCzfXDluDcPGREN.R) (15.4 KB)
normal likelihood
  matrix[2, 2] L_sigma = diag_pre_multiply(sd_R, LR);
  array[No] vector[2] mus;
  for (o in 1:No)
      mus[o] = [e_z1[o], e_z2[o]]';
  zs ~ multi_normal_cholesky(mus, L_sigma);
  
  // Measurement error in xi
  xi ~ normal(e_x, sigma_ex);
}

I don’t think your script came through correctly, but I looked at the Stan model. The first thing that stands out is your transformed parameters block, where eta[1] appears on all the lines. I think you would want to use eta instead, without brackets.

I am not sure that you want what is typically called “mulitlevel SEM”. I find these terms confusing because “single level” SEM already allows for multiple response variables nested within person: the psychometrics people say this is a single multivariate response per person, as opposed to multiple univariate responses nested in person. The psychometrics people then reserve “two-level SEM” for a situation where the people are additionally nested within higher units like schools.

I’ve never understood SEM, but I have a similar question to edm. It looks like this should have all of eta in each line, at which point you can write it as a one-liner using an outer prouct.

vector[Ni] eta;             // latent factor for each individual
...
matrix[Ni, 7] i = epsilon + eta * lambda';

I’d check that I got the transposition right.

thanks @edm @Bob_Carpenter for the replies. Indeed it seems something went wrong with attaching the R script for the simulation etc., so I have attached it again here.
Stan_SEM_sim.R (15.5 KB)

My bad for using the wrong terminology for the type of SEM I was referring to, I’m still relatively new to working with SEM’s. What would be an appropriate way to refer to this model?

I’ve edited the model code for eta, which before referred to a matrix to accommodate multiple latent variables. When fitting the stan model I am still getting lambda estimates that don’t recover the simulated values of 0.5, and instead estimate the lambdas as ~0.2. What else can be improved in the model code to get a better estimation of the lambdas?

Those lambda parameters usually have a sign indeterminacy, e.g., -lambda * -eta is the same as lambda * eta. An easy way to check whether this is a problem is to set the prior on lambda to normal(.5, .1) or something like that, and see whether you recover the correct values.

This sign indeterminacy issue comes up relatively often here, for example see Non-convergence of latent variable model - #13 by Mauricio_Garnier-Villarre

When that’s the case, it’s best to identify the model by pinning the sign of one of the parameters, e.g., declaring

parameters {
  real<lower=0> lambda;
  ...

Unless there is a lot of data, we do not expect to recover parameters exactly as posterior means. For example, suppose I generate 5 variates form a standard normal and compute their mean, e.g., here’s 10 runs:

>  for (n in 1:10) { y <- rnorm(10, mean=0, sd=1); print(c(mean(y), sd(y)), digits=2) }
[1] -0.41  0.79
[1] -0.074  1.100
[1] 0.21 0.93
[1] 0.14 0.86
[1] -0.43  0.95
[1] -1.0  1.4
[1] 0.19 0.85
[1] 0.031 1.021
[1] 0.29 1.17
[1] 0.16 0.79

So even though the simulated mean is 0 and standard deviation is 1, the recovered means and standard deviations are all over the place with only 10 draws. In fact, in this case, we know that the mean of 10 standard normal draws is distributed as normal(0, 1 / sqrt(10)). Here’s what it looks like with 1000 simulated data points rather than 10.

>  for (n in 1:10) { y <- rnorm(1000, mean=0, sd=1); print(c(mean(y), sd(y)), digits=2) }
[1] 0.033 1.005
[1] 0.036 1.010
[1] -0.029  1.025
[1] -0.0024  1.0212
[1] 0.0046 1.0166
[1] -0.081  1.021
[1] -0.013  0.995
[1] -0.0096  0.9921
[1] -0.059  1.007
[1] -0.045  1.014

What we do expect is to uncover uncertainty in a calibrated way. That is if we simulate from the model 100 times, we expect binomial(100, 0.5) of the estimated parameters to fall in their 50% posterior intervals.

I have refitted the model based on this, but the estimation does not change much, and the lambda’s are still estimated around 0.25. What does that mean for the sign interdetermancy?

Thanks for sharing that thread, I will see if I can adapt the model based on those suggestions.

The dataset has ~5000 observations, and 246 levels in the group (individuals) where the latent structure is simulated for. Both lavaan and blavaan are able to recover the simulated lambdas approximately when fitting the SEM’s with the underlying structure of 246 observations without the further layers of data structure and noise. So I’m hoping this should be enough data to recover the simulated values because this is close to the actual data structure I want to apply the model to. The Stan model I have now does not really get close to the blavaan output, so I’m hopeful that some tweaking of the model will get closer to the actual values.

I think this means that sign indeterminacy is not the source of your problem.

It was helpful to see the script. Your (b)lavaan models in there are fairly simple, including 6 observed variables and 1 latent variable per row of data. But then your Stan model seems to be combining multiple observed variables into e_z1 and e_z2, then modeling e_z1 and e_z2 as bivariate normal. I would think you want to use a 6-dimensional normal distribution in the Stan model, with one dimension per observed variable. But I am also not sure whether you have a different Stan and/or (b)lavaan model from what your previous script contained. It may be helpful to see a single blavaan model and a single Stan model that you expect to produce similar results.

I tried to setup the full blavaan model for transparency. But I realised the structure I’m aiming for does not seem to translate well to the blavaan syntax as far as I know with e.g. random slopes. Therefore I provide some more detail on the equations for clarification:

I’m simulating repeated social interactions among individuals, for focals (i) and their social partners (j) for several behaviours (z1/z2). And we are interested in the individual-level variance, and latent structure. Here is the base structure for the bivariate model:

z1/z2 = b_0 + I_i + (b_1 + I_s) x + I_j + e

Where I_i is the individual-level random intercept, I_s the individual-level random slope in response to social partner behaviour, and I_j is the random intercept of the social partner, and is thus also estimated for the focal. In our base model, we solve this equation and estimate the correlations for the 6 individual-level estimates. z1 and z2 are drawn from a multivariate distribution. For further calculations with covariances of other parameters of interest, we estimate the individual intercepts for the x as x = b_0 + I_x +e. This is why the simulations have 7 variables instead of 6 in the equations.

Now I’m aiming to expand this base model into a two-level sem (is that the correct way to refer to it?). Where the Individual-level effects for both z1/2 are caused by a latent variable \eta. For now I’m only interested to estimate one \eta and its loadings \lambda:

I = b0 + \eta_i * \lambda + e

For this I simulated the latent part for I: \lambda *\eta * \lambda + \theta plus the desired data and error structure that fits my scenario. The blavaan model indeed does not match the stan model currently as it only models the I: \lambda *\eta * \lambda + \theta \ part and not the full structure as described by the other equations. As far as I’m aware I can’t fit the full structure in blavaan like this. But I hope in the meantime while I’m adapting the model this gives some more insight into the model I’m aiming for

Okay, I realised I made a rookie mistake, and fitted the model with standardised data, and forgot to back-transform the lambda estimates. With unstandardised/back-transformed data the lambda’s are actually estimated properly. With one chain with 2000 warmup iterations and 4000 sampling interations I get the following estimates:

          mean  2.5%   50% 97.5%    n_eff  Rhat
lambda[1] 0.451 0.289 0.452 0.611 2877.354 1.000
lambda[2] 0.462 0.304 0.463 0.622 2378.540 1.000
lambda[3] 0.396 0.244 0.397 0.549 1541.710 1.003
lambda[4] 0.488 0.335 0.488 0.639 2029.935 1.000
lambda[5] 0.552 0.403 0.550 0.707 2667.887 1.000
lambda[6] 0.566 0.407 0.567 0.727 1843.265 1.001
lambda[7] 0.458 0.314 0.458 0.599 1885.041 1.000

So it seems the model does estimate the simulated values pretty decently. Now I have two more questions:

  1. How can I adapt this model to use a marginal likelihood to estimate the latent structure as in Merkle et al., 2021 JSS? This should improve the estimation as far as I understand, but the n_eff and rhats look fairly good, so is it recommended to implement this?

  2. If I have a parameter of interest that is calculated from estimated values like \phi_i = \psi * \chi_i + \epsilon_i can this be used in the structural model as \phi_i = \lambda_x * \eta_i + e_i

or in stan code

vector[Ni] phi_i1 = (psi * i[,7] + i[,3]);

phi_i1 = lambda[3] * eta + epsilon[,3]; 

in the transformed parameters block, or does this require a different approach? This seems plausible to me as it is essentially just another vector of observations where the structural part is fit to. It is however, then a bit more difficult to know whether the estimations are correct for this parameter as they will deviate from the simulated values.

Good to hear that you made progress on parameter recovery! I do not fully understand the model, but here are some thoughts:

  • About two-level SEM: I can see how you would have each person supplying multiple variables, with people then being nested in a “partner” group. This complicates the marginal likelihood approach from Merkle et al 2021 because you now have to account for correlations across people (whereas, in “one-level” SEM, you assume that people are independent). And the random slope that you mention a couple posts ago also complicates the marginal approach (and you are correct that the random slope is not possible in blavaan). So I would recommend to stay with what you are doing.
  • I am not quite sure what you are wanting to accomplish with phi_i1. It seems that you are defining phi_i1 one way when you declare it, then overwriting that value on the second line.

About two-level SEM: I can see how you would have each person supplying multiple variables, with people then being nested in a “partner” group. This complicates the marginal likelihood approach from Merkle et al 2021 because you now have to account for correlations across people (whereas, in “one-level” SEM, you assume that people are independent). And the random slope that you mention a couple posts ago also complicates the marginal approach

Considering that the n_eff and r_hat seem to be good for the lambdas, does the marginal approach offer that much increased estimation that it is worth trying to fit? I have some deadlines coming up for this project, so I am trying to see whether it will be worth the effort and time.

In the models fitted with actual data I have an additional random effect for the trial where the focal individuals and the 2 social partners play the game, the correlation between Z1 and Z2 will be estimated by the model at the trial-level. Would the marginal approach operate on that cluster than? Even after reading the paper, I am not so sure I understand how the marginal approach works and how it improves estimation.

I am not quite sure what you are wanting to accomplish with phi_i1. It seems that you are defining phi_i1 one way when you declare it, then overwriting that value on the second line

phi_i1 is one of our primary parameters of interest alongside the random slope and the random intercept. Because the (random) slope is fitted to the partners behaviour, we need to recalculate the total partner effect, which is phi_i1. We want to use phi_i1 in our structural model instead of the components used to calculate phi_i. I could provide some more details on this and the study design if that is useful.

As you say, the stan code I proposed in my last reply overwrites the defined and calculated phi_i1, which would not work then. For covariances, we were able to use the linearity of (co)variances to calculate the covariances with this phi_i1 parameter and the random intercepts and slopes. Are there similar ways to estimate the loading for phi_i1 in the structural equation or in the generated quantities? Could the correlation/covariance matrix defined in the generated quantities for instance be used for this?

I have been working on this SEM some more now, and figured out an alternative way of fitting the structural model. This appears more efficient based on the n_eff, runtime and overall cleaner code:

  // Structural model (in model block)
  i[,1] ~ normal(lambda[1] * eta, theta[1]);
  i[,2] ~ normal(lambda[2] * eta, theta[2]);
  i[,3] ~ normal(lambda[3] * eta, theta[3]);
  i[,4] ~ normal(lambda[4] * eta, theta[4]);
  i[,5] ~ normal(lambda[5] * eta, theta[5]);
  i[,6] ~ normal(lambda[6] * eta, theta[6]);
  i[,7] ~ normal(lambda[7] * eta, theta[7]);

Which yields:

           mean  2.5%   50% 97.5%    n_eff Rhat
lambda[1] 0.455 0.304 0.455 0.611 4137.505    1
lambda[2] 0.465 0.311 0.464 0.623 3751.213    1
lambda[3] 0.395 0.242 0.396 0.541 4334.355    1
lambda[4] 0.486 0.331 0.485 0.638 3489.990    1
lambda[5] 0.556 0.409 0.554 0.716 3230.899    1
lambda[6] 0.569 0.413 0.570 0.731 2917.436    1
lambda[7] 0.457 0.319 0.456 0.597 3847.483    1

Next I tried adapting this new model to estimate the loadings for the parameter that is secondarily derived from other model estimates, see code below for the changed version.

transformed parameters {
  // Estimate social impact (phi) for lambda estimation
  vector[Ni] phi_i1 = (psi[1]*i2[,3]) + i2[,1];
  vector[Ni] phi_i2 = (psi[2]*i2[,3]) + i2[,2];
}
model {
  // Measurement models

  // Equation for z1 
  e_z1 = mu[1] + i[individual, 1] + (psi[1] + i[individual, 2]) .* xjk
  + i2[opponent1, 1] + i2[opponent2, 1];
  
  // Equation for z2 
  e_z2 = mu[2] + i[individual, 3] + (psi[2] + i[individual, 4]) .* xjk
  + i2[opponent1, 2] + i2[opponent2, 2];
  
  // Equation for x
  e_x  = mu[3] + i2[individual, 3];
   
  // Structural model
  i[,1] ~ normal(lambda[1] * eta, theta[1]);
  i[,2] ~ normal(lambda[2] * eta, theta[2]);
  phi_i1 ~ normal(lambda[3] * eta, theta[3]);
  i[,3] ~ normal(lambda[4] * eta, theta[4]);
  i[,4] ~ normal(lambda[5] * eta, theta[5]);
  phi_i2 ~ normal(lambda[6] * eta, theta[6]);

  // i2 priors
  to_vector(i2) ~ normal(0,1);
}

The derived parameter phi is estimated, but including it in the structural model seems to mess up the estimation of lambda:

	       mean   2.5%    50%  97.5%    n_eff  Rhat
lambda[1] -0.460 -0.620 -0.458 -0.296 1561.523 1.000
lambda[2] -0.580 -0.790 -0.574 -0.404  506.677 1.000
lambda[3] -0.657 -0.857 -0.654 -0.465  527.509 1.000
lambda[4] -0.488 -0.649 -0.484 -0.336 1263.691 1.000
lambda[5] -0.651 -0.838 -0.651 -0.476  645.763 1.000
lambda[6] -0.950 -1.172 -0.948 -0.739  410.226 1.001

The sign of the estimates, the loadings and the n_eff are off. However, this approach to estimate the loadings for phi does make more sense then the previous. I couldn’t find any obvious coding errors, but I find it weird that the sign of the estimates has flipped. When I set the lambda prior to

lambda ~ normal(0.5, 0.1); 

The sign of the lambdas flip to:

           mean  2.5%   50% 97.5%    n_eff  Rhat
lambda[1] 0.474 0.350 0.475 0.598 1395.571 1.001
lambda[2] 0.564 0.437 0.565 0.700  862.601 1.005
lambda[3] 0.555 0.423 0.555 0.688  875.396 1.003
lambda[4] 0.498 0.378 0.497 0.618 1383.479 1.000
lambda[5] 0.610 0.481 0.609 0.740 1009.405 1.001
lambda[6] 0.720 0.585 0.719 0.856  732.790 1.001

What does this mean for the model estimation exactly and what is the best way to deal with this? The estimation of the lambdas does differ a bit from the first method, and some of the estimates do deviate from the simulated values again. Is there any way to solve this problem for fitting a structural model for a secondarily derived parameter?

From two posts ago: I think the work involved in getting the marginal approach working for this model will not be worth it unless you are planning to reuse the model for multiple projects.

About the sign change in the post above: the issue is basically that lambda * eta = -lambda * -eta. One of my previous posts in this thread includes a link to further discussion and some solutions. And if you have a structural model where eta is involved in regressions, then those regression weights will also change signs.

It looks like your informative normal(.5, .1) prior makes the lambda posterior distributions narrower than they would otherwise be. I often leave lambda unconstrained during estimation, then deal with the sign change in generated quantities (say, flipping the signs when lambda[1] is negative). More information about that appears in that sign indeterminacy link from my previous post.

I now have a model that seems to estimate the simulated lambda’s fine, and deals with the sign indeterminacy:

data {
  int<lower=1> No;  // total number of observations
  int<lower=1> Ni;  // number of individuals
  array[No] int individual; // individual identity for each observation
  array[No] int<lower=1> opponent1; // opponent 1 ID
  array[No] int<lower=1> opponent2; // opponent 2 ID
  
  // Predictors
  vector[No] xi;     // focal observed x trait (with measurement error)
  vector[No] xjk;    // average social partner x trait
  
  // Response variables
  array[No] vector[2] zs; // z1 and z2
}

parameters {
  // Fixed effects
  vector[3] mu;           // intercepts: z1, z2, x
  vector[2] psi;          // plasticity slopes for z1 and z2
  
  // Latent variable parameters
  vector[Ni] eta;             // latent factor for each individual
  vector[6] lambda;           // factor loadings 
  vector<lower=0>[6] theta;   // unique/residual variances for each trait

  // Residuals
  cholesky_factor_corr[2] LR;   // residual correlation between z1 and z2
  vector<lower=0>[2] sd_R;      // residual SDs for z1 and z2
  real<lower=0> sigma_ex;       // residual SD for xi

  // Individual-level social traits 
  matrix[Ni, 4] i; //  (eta * lambda + e)
  matrix[Ni, 3] i2; //  
}

transformed parameters {
  // Estimate social impact (phi) for lambda estimation
  vector[Ni] phi_i1 = (psi[1]*i2[,3]) + i2[,1];
  vector[Ni] phi_i2 = (psi[2]*i2[,3]) + i2[,2];
}

model {
  // Expected values
  vector[No] e_z1;
  vector[No] e_z2;
  vector[No] e_x;
  
  // Measurement models

  // Equation for z1 
  e_z1 = mu[1] + i[individual, 1] + (psi[1] + i[individual, 2]) .* xjk
  + i2[opponent1, 1] + i2[opponent2, 1];
  
  // Equation for z2 
  e_z2 = mu[2] + i[individual, 3] + (psi[2] + i[individual, 4]) .* xjk
  + i2[opponent1, 2] + i2[opponent2, 2];
  
  // Equation for x
  e_x  = mu[3] + i2[individual, 3];
  
  // Priors
  // Fixed effects
  mu ~ normal(0, 1);
  psi ~ normal(0, 1);
  
  // Latent structure
  eta ~ normal(0, 1);         // latent factor
  lambda ~ normal(0, 1);      // factor loadings
  theta ~ exponential(3);     // unique variances
  
  // Structural model
  i[,1] ~ normal(lambda[1] * eta, theta[1]);
  i[,2] ~ normal(lambda[2] * eta, theta[2]);
  phi_i1 ~ normal(lambda[3] * eta, theta[3]);
  i[,3] ~ normal(lambda[4] * eta, theta[4]);
  i[,4] ~ normal(lambda[5] * eta, theta[5]);
  phi_i2 ~ normal(lambda[6] * eta, theta[6]);

  // i2 priors
  to_vector(i2) ~ normal(0,1);

  // Residuals
  to_vector(sd_R) ~ exponential(3);
  sigma_ex ~ exponential(3);
  LR ~ lkj_corr_cholesky(1.5);
  
  // Multi-normal likelihood
  matrix[2, 2] L_sigma = diag_pre_multiply(sd_R, LR);
  array[No] vector[2] mus;
  for (o in 1:No)
      mus[o] = [e_z1[o], e_z2[o]]';
  zs ~ multi_normal_cholesky(mus, L_sigma);
  
  // Measurement error in xi
  xi ~ normal(e_x, sigma_ex);
}

generated quantities {
// signed corrected parameters
vector[Ni] eta_adj = eta;
vector[6] lambda_adj = lambda;

if(lambda[1] < 0){
  lambda_adj = to_vector(-1*lambda);
  eta_adj = to_vector(-1*eta);
  } 
}

I simulated lambda’s of 0.5 or 0.8, which the model recovers well. The lambda for \phi_i = \psi * \epsilon_i + x_i should be 1 or 1.6 for \lambda = 0.5 or 0.8 respectively with \psi =1 as \lambda \phi should be \psi * \lambda \epsilon_i + \lambda x_i if I’m not mistaken. I was working on fitting this model for 50+ simulated datasets, but ran into an error in the process. But overall the model seems to estimate the lambda’s well, but with lower n_eff:

               mean  2.5%   50% 97.5%    n_eff  Rhat
lambda_adj[1] 0.572 0.441 0.571 0.711  991.757 1.001
lambda_adj[2] 0.592 0.450 0.590 0.750  373.360 1.008
lambda_adj[3] 0.956 0.780 0.954 1.147  385.122 1.005
lambda_adj[4] 0.532 0.406 0.531 0.666 1231.400 1.000
lambda_adj[5] 0.605 0.468 0.602 0.759  400.202 1.006
lambda_adj[6] 0.926 0.745 0.924 1.122  316.068 1.005
theta[1]      0.725 0.645 0.723 0.814 1989.974 1.000
theta[2]      0.802 0.715 0.800 0.906 1523.053 1.001
theta[3]      0.736 0.627 0.737 0.845  971.704 1.000
theta[4]      0.741 0.660 0.740 0.827 2173.968 1.000
theta[5]      0.777 0.693 0.774 0.870 1530.546 1.001
theta[6]      0.834 0.729 0.833 0.943 1585.082 1.001

Two more things:

  1. I was looking into how well the model estimates \eta_i compared to the simulated \eta, which depended on the simulated loadings (0,5 or 0.8). The values don’t exactly add up, with absolute mean deviations of 0.4 and 0.2. However, the estimated and simulated \eta were highly correlated r = 0.89, and r =0.98 respectively. I was planning on using the estimated \eta_i in a subsequent model according to an error-in-variable approach. But as the estimated \eta_i don’t fully match the simulated values, the model struggles a bit to recover some simulated beta’s correctly, and has lower n_eff. I don’t believe the model will be able to ever exactly estimate \eta_i, but is that a problem as the estimated values are very highly correlated?

  2. One of my goals was to compare different SEM fits, which should work appropriately with a leave-one-out cross-validation (loo) approach with the model log-likelihood. As was also described in the post you linked to previously:

/// marginal log-likelihood based on signed corrected parameters
Phi_lv = quad_form_diag(Rho, Sd_d);
lambda_phi_lambda = quad_form_sym(Phi_lv, transpose(lam));
theta_del = diag_matrix(sd_p);

Sigma = lambda_phi_lambda + theta_del;
L_Sigma_model = cholesky_decompose(Sigma);

for(i in 1:N){
log_lik[i] = multi_normal_cholesky_lpdf(X[i] | b, L_Sigma_model);
}

log_lik0 = sum(log_lik); // global log-likelihood
dev = -2*log_lik0; // model deviance

Would this work similarly for my model as the latent structure is estimated for individual-level group, and only has one latent variable?

  1. About the \eta parameters: I imagine that there will be shrinkage because the prior of those parameters is N(0,1). If so, then as you add more observed variables per individual (beyond 6), I would expect the shrinkage to disappear and the recovery to be better.

  2. The code that you posted looks like it could handle a single individual, but you also want to consider, e.g., how individual 1 in group 1 is correlated with individual 2 in group 1. I think you want to specify a multivariate normal for all the observed variables in each group, then the loop in your code would be over groups instead of over individuals. This is a place where I find it easier to write down the model and work out marginal distributions on paper.

For 1. The observed data I have has similar structure and number of observations and individuals to the data I simulated. So this will not be able to be improved on then, and how well the estimated eta scores will reflect the true eta will then be dependent on the magnitude of the lambdas.

For 2. I am now using the following stan code in the generated quantities to estimate the loglik for LOOCV model comparison:

// Model log_likelihood for leave-one-out cross validation (loocv)
  vector[No] log_lik;
  matrix[2, 2] L_sigma = diag_pre_multiply(sd_R, LR);

  for (o in 1:No) {
    vector[2] mu_z;
    mu_z[1] = mu[1] + i[individual[o], 1] + (psi[1] + i[individual[o], 2]) * xjk[o]
              + i2[opponent1[o], 1] + i2[opponent2[o], 1];
    mu_z[2] = mu[2] + i[individual[o], 3] + (psi[2] + i[individual[o], 4]) * xjk[o]
              + i2[opponent1[o], 2] + i2[opponent2[o], 2];

    log_lik[o] = multi_normal_cholesky_lpdf(zs[o] | mu_z, L_sigma);

As the structural model is applied to the “i” part of the model, different structural models should be able to be compared I think. How would I expand this code to accommodate your suggestion?

  1. After my last post I noticed that fitting the following adjusted code in the model block yielded better estimation:
  // Structural model
  i[,1] ~ normal(lambda[1] * eta, theta[1]);
  i[,2] ~ normal(lambda[2] * eta, theta[2]);
  ((psi[1]*i2[,3]) + i2[,1]) ~ normal(lambda[3] * eta, theta[3]);
  i[,3] ~ normal(lambda[4] * eta, theta[4]);
  i[,4] ~ normal(lambda[5] * eta, theta[5]);
  ((psi[2]*i2[,3]) + i2[,2]) ~ normal(lambda[6] * eta, theta[6]);

However, in general, after taking a further look at the other model parameters I realised that even though the lambda’s are estimated correctly, some of the fixed effects like the covariate slope, and the residual correlation were no longer estimated well compared to the base model and the other structural models fitted on only the i derived parameters (see pasted below):

  // Structural model
  i[,1] ~ normal(lambda[1] * eta, theta[1]);
  i[,2] ~ normal(lambda[2] * eta, theta[2]);
  i[,3] ~ normal(lambda[3] * eta, theta[3]);
  i[,4] ~ normal(lambda[4] * eta, theta[4]);
  i[,5] ~ normal(lambda[5] * eta, theta[5]);
  i[,6] ~ normal(lambda[6] * eta, theta[6]);
  i[,7] ~ normal(lambda[7] * eta, theta[7]);

or

transformed parameters {
// Structural model
matrix[Ni, 7] i =  eta * lambda' + epsilon;
}

So it seems that the structural estimation for the composite variable phi, is somehow affecting the estimation of other parameters, which is troublesome. Would anyone have an idea what is causing this? I should be able to estimate the lambda loading for phi using the lambda and beta of the components that form phi as such in the generated quantities:

// Estimate loadings for phi
real lambda_phi1 = psi[1] * lambda_adj[7] + lambda_adj[3];
real lambda_phi2 = psi[2] * lambda_adj[7] + lambda_adj[6];

However, this requires that a loading of the component i[,7] needs to be arbitralily assigned to the eta, which is not what we want. This is espcially true for situations with 2 latent variables. If anyone would have an idea how to solve this issue, please let me know!

About your #2 above: in general, you want to write a likelihood that does not involve i variables in mu_z. Instead, the correlations implied by the i variables get incorporated into the multivariate normal covariance matrix. I am afraid I don’t have time to work out the marginalization right now, but I’ll see if I can get back to it.

About your #3: could you post your current model?

Here is the latest model I fitted for this analyses pasted below. I tried putting more constraints on the model (e.g. positive loadings, and fixed small residuals for the composite parameter), but that didn’t help much with the issues I’m having. If you have any further ideas they are much appreciated!

data {
  int<lower=1> No;  // total number of observations
  int<lower=1> Ni;  // number of individuals
  array[No] int individual; // individual identity for each observation
  array[No] int<lower=1> opponent1; // opponent 1 ID
  array[No] int<lower=1> opponent2; // opponent 2 ID
  
  // Predictors
  vector[No] xi;     // focal observed x trait (with measurement error)
  vector[No] xjk;    // average social partner x trait
  
  // Response variables
  array[No] vector[2] zs; // z1 and z2
}

parameters {
  // Fixed effects
  vector[3] mu;           // intercepts: z1, z2, x
  vector[2] psi;          // plasticity slopes for z1 and z2
  
  // Latent variable parameters
  vector[Ni] eta;             // latent factor for each individual
  vector[6] lambda;           // factor loadings 
  vector<lower=0>[6] theta;   // unique/residual variances for each trait

  // Residuals
  cholesky_factor_corr[2] LR;   // residual correlation between z1 and z2
  vector<lower=0>[2] sd_R;      // residual SDs for z1 and z2
  real<lower=0> sigma_ex;       // residual SD for xi

  // Individual-level social traits 
  matrix[Ni, 4] i; //  (eta * lambda + e)
  matrix[Ni, 3] i2; //  
}

transformed parameters {
  // Estimate social impact (phi) for lambda estimation
  //vector[Ni] phi_i1 = (psi[1]*i2[,3]) + i2[,1];
  //vector[Ni] phi_i2 = (psi[2]*i2[,3]) + i2[,2];
}

model {
  // Expected values
  vector[No] e_z1;
  vector[No] e_z2;
  vector[No] e_x;
  
  // Measurement models

  // Equation for z1 
  e_z1 = mu[1] + i[individual, 1] + (psi[1] + i[individual, 2]) .* xjk
  + i2[opponent1, 1] + i2[opponent2, 1];
  
  // Equation for z2 
  e_z2 = mu[2] + i[individual, 3] + (psi[2] + i[individual, 4]) .* xjk
  + i2[opponent1, 2] + i2[opponent2, 2];
  
  // Equation for x
  e_x  = mu[3] + i2[individual, 3];
  
  // Priors
  // Fixed effects
  mu ~ normal(0, 1);
  psi ~ normal(0, 1);
  
  // Latent structure
  eta ~ normal(0, 1);         // latent factor
  lambda ~ normal(0, 1);      // factor loadings
  theta ~ exponential(3);     // unique variances
  
  // Structural model
  i[,1] ~ normal(lambda[1] * eta, theta[1]);
  i[,2] ~ normal(lambda[2] * eta, theta[2]);
  ((psi[1]*i2[,3]) + i2[,1]) ~ normal(lambda[3] * eta, theta[3]);
  i[,3] ~ normal(lambda[4] * eta, theta[4]);
  i[,4] ~ normal(lambda[5] * eta, theta[5]);
  ((psi[2]*i2[,3]) + i2[,2]) ~ normal(lambda[6] * eta, theta[6]);

  // i2 priors
  to_vector(i2) ~ normal(0,1);

  // Residuals
  to_vector(sd_R) ~ exponential(3);
  sigma_ex ~ exponential(3);
  LR ~ lkj_corr_cholesky(1.5);
  
  // Multi-normal likelihood
  matrix[2, 2] L_sigma = diag_pre_multiply(sd_R, LR);
  array[No] vector[2] mus;
  for (o in 1:No)
      mus[o] = [e_z1[o], e_z2[o]]';
  zs ~ multi_normal_cholesky(mus, L_sigma);
  
  // Measurement error in xi
  xi ~ normal(e_x, sigma_ex);
}

generated quantities {
// signed corrected parameters
vector[Ni] eta_adj = eta;
vector[6] lambda_adj = lambda;

if(lambda[1] < 0){
  lambda_adj = to_vector(-1*lambda);
  eta_adj = to_vector(-1*eta);
} 

// Model log_likelihood for leave-one-out cross validation (loocv)
  vector[No] log_lik;
  matrix[2, 2] L_sigma = diag_pre_multiply(sd_R, LR);

  for (o in 1:No) {
    vector[2] mu_z;
    mu_z[1] = mu[1] + i[individual[o], 1] + (psi[1] + i[individual[o], 2]) * xjk[o]
              + i2[opponent1[o], 1] + i2[opponent2[o], 1];
    mu_z[2] = mu[2] + i[individual[o], 3] + (psi[2] + i[individual[o], 4]) * xjk[o]
              + i2[opponent1[o], 2] + i2[opponent2[o], 2];

    log_lik[o] = multi_normal_cholesky_lpdf(zs[o] | mu_z, L_sigma);
  }

}