Specification of Bayesian SEM models with a data-augmentation approach

Dear folks,

I am trying to fit Bayesian Structural Equation Modeling (BSEM) models in Stan. I can easily do this thanks to Ed Merkle’s awesome package blavaan, but I hope to fit BSEM models with a data-augmentation approach that Song and Lee (2012) uses.

Song, X.-Y., & Lee, S.-Y. (2012). Basic and Advanced Bayesian Structural Equation Modeling: With Applications in the Medical and Behavioral Sciences. West Sussex, United Kindom: John Wiley & Son, Ltd.

I was lucky to find the previous issue (Non-convergence of latent variable model - #4 by Reg_John), and several posts by @Mauricio_Garnier-Villarre were quite helpful. Still, I would love to get some advice on how to program the model exactly.

My scenario is to set three explanatory latent variables (ksi) and one outcome latent variable (eta) where the former predicts the latter.

You can find the code I used to simulate data below:

# 1. Load packages
require(blavaan)
require(rstan)
options(mc.cores = parallel::detectCores())
rstan_options(auto_write = TRUE)

# 2. Target model
model.target <- '
# measurement equation
ksi1 =~ x1 + 0.5*x2 + 0.5*x3
ksi2 =~ x4 + 0.5*x5 + 0.5*x6
ksi3 =~ x7 + 0.5*x8 + 0.5*x9
eta  =~ y1 + 0.5*y2 + 0.5*y3

# structural equation
eta ~ 0.5*ksi1 + 0.5*ksi2 + 0.5*ksi3
'

# 3. Simulate data
set.seed(322)
sim.dat.big <- simulateData(model.target, sample.nobs = 3000)

My goal is to have almost the same estimates as blavaan produces from the code below:

# 4. Specify a model in practice
model.practice <- '
# measurement equation
ksi1 =~ x1 + x2 + x3
ksi2 =~ x4 + x5 + x6
ksi3 =~ x7 + x8 + x9
eta  =~ y1 + y2 + y3

# structural equation
eta ~ ksi1 + ksi2 + ksi3
'

# 5. Model fitting
set.seed(322) # reproducibility
blav.fit <- bsem(model.practice, data=sim.dat.big, n.chains=2)
summary(blav.fit)

So far, I have programmed the Stan model like below:

data{
  int N; // sample size
  int P; // number of variables
  matrix[N, P] Y; // data matrix Y with N rows and P columns
  int D; // number of total latent variables
  int K; // number of explanatory latent variables (ksi)
  int E; // number of outcome latent variables (eta)
}

parameters{
  vector[P] nu; // intercepts of observed variables
  vector[P-D] lam; // factor loadings
  vector[K] beta; // structural regressions
  matrix[N, K] FS_K; // factor scores of explanatory latent variables
  vector[N] FS_E[E]; // factor scores of outcome latent variables
  cov_matrix[D] cov_K; // covariance matrix among explanatory latent variables
  vector<lower=0>[P] var_P; // variance for observed variables
  matrix<lower=0>[K, K] var_K; // covariance for explanatory latent variables
  vector<lower=0>[E] var_E; // variance for outcome latent variables
}

transformed parameters{
  vector[N] mu_pred[P]; // predicted values of observed variables
  vector[N] nu_pred; // predicted values of eta

  for(i in 1:N){
    mu_pred[i,1] = nu[1] + FS_K[i,1];
    mu_pred[i,2] = nu[2] + lam[1]*FS_K[i,1];
    mu_pred[i,3] = nu[3] + lam[2]*FS_K[i,1];
    mu_pred[i,4] = nu[4] + FS_K[i,2];
    mu_pred[i,5] = nu[5] + lam[3]*FS_K[i,2];
    mu_pred[i,6] = nu[6] + lam[4]*FS_K[i,2];
    mu_pred[i,7] = nu[7] + FS_K[i,3];
    mu_pred[i,8] = nu[8] + lam[5]*FS_K[i,3];
    mu_pred[i,9] = nu[9] + lam[6]*FS_K[i,3];
    mu_pred[i,10] = nu[10] + FS_E[i];
    mu_pred[i,11] = nu[11] + lam[7]*FS_E[i];
    mu_pred[i,12] = nu[12] + lam[8]*FS_E[i];
}

  for(i in 1:N){
    nu_pred[i] = beta[1]*xi[i,1]+beta[2]*xi[i,2]+beta[3]*xi[i,3];
  }
}

model{
  // hyperpriors
  R = diag_matrix(rep_vector(1, K))
  vector[K] u;
  u = rep_vector(0, K);

  // priors on intercepts
  for (i in 1:P) {nu[i] ~ normal(0, 10)};
  
  // priors on factor loadings
  for (i in 1:(P-D)) {lam[i] ~ normal(0, 10)};

  // priors on regression coefficients
  for (i in 1:K) {beta[i] ~ normal(0, 10)};

  // priors on variance
  for(i in 1:P) {var_P[i] ~ gamma(1, 0.5)};
  var_E ~ gamma(1, 0.5);
  var_K ~ wishart(R, 12);

  // likelihood
  for(i in 1:N){
    for(j in 1:P){
      Y[i, j] ~ normal(mu_pred[i,j],var_P[j]);
      xi[i, 1:3] ~ multi_normal(u, var_K);
      eta[i] ~ normal(nu_pred[i], var_E);
    }
  }
}

When I used this specification, the rstan gives me an error saying that there exists a dimension mismatch. However, I cannot figure out where it exists, so could anyone help me get the same results with this Stan code as those from the blavaan? Any additional feedback is appreciated.

Thank you a lot for reading this issue!

Hi iHeo

I am glad you have found some of my comments useful.

If you want to apply the data augmentation approach you can also do this in blavaan, with the argument target=“standclassic”. It was the original model specification, but move to the marginal model as default as it is faster

Now, with your model, I can see a few issues. Where you specify xi, need to change it for FS_K, and where you specify eta, need to change it for FS_E.

nu_pred[i] = beta[1]*FS_K[i,1]+beta[2]*FS_K[i,2]+beta[3]*FS_K[i,3];

And you are running the loop in the the likelihood over incorrect dimensions. Only Y should loop over P

 for(i in 1:N){
    FS_E[i] ~ normal(nu_pred[i], var_E);
    FS_K[i] ~ multi_normal(u, var_K);
    for(j in 1:P){
      Y[i, j] ~ normal(mu_pred[i,j],var_P[j]);
    }
  }

After this you will have to define the generated quantities{} block to adjust the factor loadings, factor correlations and factor regressions for factor indeterminancy

Hope this helps

2 Likes

Hi @Mauricio_Garnier-Villarre!
Thanks for your reply. I am glad to see you here. I appreciate your time in checking my model specification.

If you want to apply the data augmentation approach you can also do this in blavaan, with the argument target=“standclassic”. It was the original model specification, but move to the marginal model as default as it is faster

Yes, I do know this fact. I tried to edit the code from “target=stanclassic” initially, but it was not easy for me to understand and adjust the bulky code. I thus tried to make it simple by using the part I only need following the Song and Lee (2012).

I also agree with what you have pointed out. I have fixed it. Following your advice, I will now try to code generated quantities{} for factor indeterminacy.

Thank you a lot!

iHeo

Yeah, blavaan Stan code can be hard to follow and modify. But because of that, its model will run faster that this type of code. This type more similar to Song and Lee (2012) its slower but gets the job done

Good luck, we are here to help!

1 Like

Dear Stan folks and @Mauricio_Garnier-Villarre,

I am facing some error messages regardless of the working Stan code for the BSEM (still in the data-augmentation approach).

The stan code I have is:

data{
int N; // sample size
int P; // number of variables
matrix[N, P] Y; // data matrix Y with N rows and P columns
int D; // number of total latent variables
int K; // number of explanatory latent variables (ksi)
int E; // number of outcome latent variables (eta)
matrix[K, K] R; // diagonal matrix
}

parameters{
vector[P] nu; // intercepts of observed variables
vector[P-D] lam; // factor loadings
vector[K] beta; // structural regressions
matrix[N, K] FS_K; // factor scores of explanatory latent variables
vector[E] FS_E; // factor scores of outcome latent variables
vector<lower=0>[P] var_P; // variance for observed variables
cov_matrix[K] cov_K; // covariance for explanatory latent variables
vector<lower=0>[E] var_E; // variance for outcome latent variables
}

transformed parameters{
matrix[N, P] mu_pred; // predicted values of observed variables
vector[N] nu_pred; // predicted values of eta

for(i in 1:N){
mu_pred[i,1] = nu[1] + FS_K[i,1];
mu_pred[i,2] = nu[2] + lam[1]*FS_K[i,1];
mu_pred[i,3] = nu[3] + lam[2]*FS_K[i,1];
mu_pred[i,4] = nu[4] + FS_K[i,2];
mu_pred[i,5] = nu[5] + lam[3]*FS_K[i,2];
mu_pred[i,6] = nu[6] + lam[4]*FS_K[i,2];
mu_pred[i,7] = nu[7] + FS_K[i,3];
mu_pred[i,8] = nu[8] + lam[5]*FS_K[i,3];
mu_pred[i,9] = nu[9] + lam[6]*FS_K[i,3];
mu_pred[i,10] = nu[10] + FS_E[i];
mu_pred[i,11] = nu[11] + lam[7]*FS_E[i];
mu_pred[i,12] = nu[12] + lam[8]*FS_E[i];
}

for(i in 1:N){
  nu_pred[i] = beta[1]*FS_K[i,1]+beta[2]*FS_K[i,2]+beta[3]*FS_K[i,3];
}
}

model{
// hyperpriors
vector[K] u;
u = rep_vector(0, K);

  // priors on intercepts
for (i in 1:P) {nu[i] ~ normal(0, 10);}
  
  // priors on factor loadings
for (i in 1:(P-D)) {lam[i] ~ normal(0, 10);}

// priors on regression coefficients
for (i in 1:K) {beta[i] ~ normal(0, 10);}

// priors on variance
for(i in 1:P) {var_P[i] ~ gamma(1, 0.5);}
var_E ~ gamma(1, 0.5);
cov_K ~ wishart(12, R);

  // likelihood
for(i in 1:N){
  FS_E[i] ~ normal(nu_pred[i], var_E);
  FS_K[i] ~ multi_normal(u, cov_K);
  for(j in 1:P){
    Y[i, j] ~ normal(mu_pred[i,j],var_P[j]);
  }
}
}

The R code I used to run the stan model follows (please note that I saved the Stan code under the name “cfa.stan”:

stan.dat <- list(N=nrow(sim.dat.big), P=ncol(sim.dat.big), K=3, E=1, Y=as.matrix(sim.dat.big),
                 R=matrix(c(1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0), nrow=3))

stan.sem.model <- stan_model("cfa.stan")

stan.fit <- rstan::stan(file = "cfa.stan", data = stan.dat, seed = 322, cores = 2, chains = 2,
                        iter = 1000)

What I get as error messages is:

Error in Module(module, mustStart = TRUE) :
Failed to initialize module pointer: Error in FUN(X[[i]], …): no such symbol _rcpp_module_boot_stan_fit4model873ce7b34fc_cfa_mod in package C:/Users/IHN-WH~1/AppData/Local/Temp/RtmpoVLvSt/file873c19b36b87.dll

My questions are:
(1) Do you have any ideas about why these errors are happening (or the meaning of them)?
(2) I did not add the generated quantities{} block for factor indeterminacy (i.e., factor model identification). This is because I already fixed the first factor loadings of each factor to 1 in the Stan code. I am thus wondering whether I still have to do this.
(3) If I have to specify the factor indeterminacy to fix the first factor loadings to 1 instead of fixing the factor variance to 1, how should I specify that? When I referred to the previous Stan issues (Non-convergence of latent variable model) and Stan mailing lists, the method of fixing the factor variance to 1 was used for identification. So, I would like to receive any advice.

Thank you for your time!

iHeo

You had 2 mistakes in your code, I think the error is due to this and just not a clear error message in this case

The dimension of FS_E is N, instead of E

vector[N] FS_E; // factor scores of outcome latent variables

And need to include D=4 in the stan.data object

About the indeterminancy. I dont usually use marker variable for identification, prefer the factor variance interpretation of the model. So, I am not sure if the marker variable would be enough to fix factor indeterminancy. Would suggest to at least run the model as is and can check the traceplots for the factor loadings for example and see if there is sing switching

1 Like

Thanks a lot for your kind help, @Mauricio_Garnier-Villarre!

I see that the code works, the computational time is quite intensive, though.
There are two current issues. First, when I tried to inspect the convergence, the fitted Stan object does not work with the error message saying that stanfit Error in mcmc.list(x) : Arguments must be mcmc objects. I think I can solve this issue by referring to other Stan issues. Second, the Stan session is oftentimes aborted. This one might have something to do with local RAM capabilities or volumes of the Stan fit objects.
Once I clear these issues, I will share some information about convergence with the current factor model identification.
Thanks a lot!

Cheers,
Ihnwhi

Ihnwhi

A reason your Stan object is too large is becuase you are saving all the parameters in the model. Most comonly we are only interested in the structural parameters. You can select to only save those, by adding the pars argument

pars <- c("lam","beta","var_E","cov_K","var_P","nu")

stan.fit <- stan(model_code=cfa.stan, 
                        data = stan.dat, 
                        seed = 322, 
                        cores = 3, 
                        chains = 3,
                        iter = 1000)

Then you can check the traceplots for specific set of parameters

traceplot(stan.fit, pars="lam")

Lastly, if you want to speed up your code, the main slow down is probably in using the multi_normal distribution. As the dimensions increase this distribution is slower, if you ant to spped it up you should swith to the multi_normal_cholesky parameterization for multivariate normals, this one is a lot faster

Take care

1 Like

I appreciate your advice, Mauricio (@Mauricio_Garnier-Villarre)!

Yes! I also realized that I was saving all values, even including factor scores. So, I am currently trying to fit the Stan object with only “lam”, “beta”, and “nu” parameters for the volume-efficiency.

Your last tip to decrease the computational time is so helpful! I was figuring out how to increase the speed, so I think I will try to program the Cholesky decomposition instead of using multivariate normal densities. I think I can do it, but If I encounter some difficulties, please let me ask here, again.

Many thanks!

Ihnwhi

In the previous thread you reference, my post Convergence CFA you can see the example syntax I used with the cholesky decomposition, and how to tranform it int correlations in the generated quantities{} block

1 Like

Thanks, Mauricio (@Mauricio_Garnier-Villarre)!

I referred to your Stan programming. In your multilevel CFA programming, the correlation matrix among factors is taken in Cholesky decomposition. In my case, then, is okay to take the Cholesky decomposition to the covariance matrix among explanatory latent variables? I quickly programmed this as below (with a generated K by K matrix L for Cholesky decomposition; data{} block and parameters{} block being unchanged). Is the programming doing exactly the same thing as I did with multivariate normal densities?

transformed parameters{
matrix[N, P] mu_pred; // predicted values of observed variables
vector[N] nu_pred; // predicted values of eta
matrix[K, K] L;
L = cholesky_decompose(cov_K);

  for(i in 1:N){
    mu_pred[i,1] = nu[1] + FS_K[i,1];
    mu_pred[i,2] = nu[2] + lam[1]*FS_K[i,1];
    mu_pred[i,3] = nu[3] + lam[2]*FS_K[i,1];
    mu_pred[i,4] = nu[4] + FS_K[i,2];
    mu_pred[i,5] = nu[5] + lam[3]*FS_K[i,2];
    mu_pred[i,6] = nu[6] + lam[4]*FS_K[i,2];
    mu_pred[i,7] = nu[7] + FS_K[i,3];
    mu_pred[i,8] = nu[8] + lam[5]*FS_K[i,3];
    mu_pred[i,9] = nu[9] + lam[6]*FS_K[i,3];
    mu_pred[i,10] = nu[10] + FS_E[i];
    mu_pred[i,11] = nu[11] + lam[7]*FS_E[i];
    mu_pred[i,12] = nu[12] + lam[8]*FS_E[i];
  }

  for(i in 1:N){
    nu_pred[i] = beta[1]*FS_K[i,1]+beta[2]*FS_K[i,2]+beta[3]*FS_K[i,3];
  }
}

model{
  // hyperpriors
  vector[K] u;
  u = rep_vector(0, K);

  // priors on intercepts
  for (i in 1:P) {nu[i] ~ normal(0, 10);}

  // priors on factor loadings
  for (i in 1:(P-D)) {lam[i] ~ normal(0, 10);}
 
  // priors on regression coefficients
  for (i in 1:K) {beta[i] ~ normal(0, 10);}
  
  // priors on variance
  for(i in 1:P) {var_P[i] ~ gamma(1, 0.5);}
  var_E ~ gamma(1, 0.5);
  cov_K ~ wishart(12, R);

  // likelihood
  for(i in 1:N){
    FS_E[i] ~ normal(nu_pred[i], var_E);
    FS_K[i] ~ multi_normal_cholesky(u, L);
      for(j in 1:P){
        Y[i, j] ~ normal(mu_pred[i,j],var_P[j]);
      }
  }
}

Yes, this should work. You should see a speed up like this

1 Like

Yes, this should work. You should see a speed up like this

That’s good, thanks! I will try it!

Yes, this should work. You should see a speed up like this

I can say that the computation time has decreased by one hour! The estimates are very slightly different, though.

I am now wondering about the scenes behind this code. Are the residual variances and covariances also being estimated, although I did not specify them explicitly? If then, how may I extract the estimates?

Glad its working faster. It is expected for the parameters to be slighltly different for 2 main reasons, (a) stochastic nature of the MCMC, and (b) the model parameterization is not the same. Now, you do want to see is that the posterior distributions mostly overlap

Yeah, you have them in the model, just need to add the to the saved parameters

pars <- c("lam", ## factor loadings
          "beta", ## regressions coeffients
          "var_E", ## residual variance outcome factor
          "cov_K", ## variance/covariance matrix predictor factors
          "var_P", ## residual variance indicators
          "nu") ## intercepts indicators

stan.fit <- stan(model_code=cfa.stan, 
                        data = stan.dat, 
                        seed = 322, 
                        cores = 3, 
                        chains = 3,
                        iter = 500)
1 Like

Glad its working faster. It is expected for the parameters to be slighltly different for 2 main reasons, (a) stochastic nature of the MCMC, and (b) the model parameterization is not the same. Now, you do want to see is that the posterior distributions mostly overlap

True. I thought the first reason might be the cause, but it’s glad to know the second reason, even. Thanks for your clarification! When I compared the posterior densities, they almost overlap, which is a good sign.

Yeah, you have them in the model, just need to add them to the saved parameters

Ah, agreed. I thought the form of a covariance matrix that encompasses both the explanatory and outcome latent variables. I can see them separately, indeed. Then, is it true that I can see the residual variance of indicators but not the covariance among the indicators since they were not parameterized in the code?

Thank you for your time spent on my questions.

That is expected, good. To clarify, the model is the same but ssome differences like priors etc might lead to minor differences.

That is correct, your model has independent indicators (after conditioning for the underlying factors)

1 Like

Thanks a lot for your clarification, Mauricio (@Mauricio_Garnier-Villarre)!!!

Please let me have the last question about setting constraints among, for example, the covariance for explanatory latent variables. For now, I freely estimated the covariance among the explanatory latent variables. If I want to fix the certain covariance among the explanatory latent variables to 0, I think the way is to move cov_matrix[K] cov_K; // covariance for explanatory latent variables from the parameters{} block to the data{} block so that I can manually input the constraints. I think this should work in theory, but are there any other more efficient methods of doing this?

Estimating constraint matrices can be cumbersome. How I would do it, would be by setting each estimated individual covariance as separate parameters (maybe with an uniform prior to start), and then build the covariance matrix i the transform parameters block. So that the new matrix would have several 0s and only estimate the desired covariances.
The issue with the method is that can lead to more rejections during sampling as its harder to assure the matrix is positive definite.

Another possible method would be to estimate to estimate each factor as univariate. And approximate the specific covariances from phantom factors.

  • have a factor with fixed mean at 0, and sd to 1, N(0, 1)
  • have this factor define by the 2 first order factors that you want to have covariance
  • have the 2 factor loadings being equal for both factors
  • the square LAM^2 of this equated factor loadings is the covariance between the 2 factors
1 Like

Thanks for your suggestions!

Estimating constraint matrices can be cumbersome. How I would do it, would be by setting each estimated individual covariance as separate parameters (maybe with an uniform prior to start), and then build the covariance matrix i the transform parameters block. So that the new matrix would have several 0s and only estimate the desired covariances.

This method is quite intuitive! The expected concern is a too-big volume, then.

Another possible method would be to estimate to estimate each factor as univariate. And approximate the specific covariances from phantom factors.

This method sounds cool! It should be similar with what blavaan does with the phantom variable. Still, I am afraid I didn’t fully get your point.

  • have this factor define by the 2 first order factors that you want to have covariance

Could you explain more about what this point mean by?

  • the square LAM^2 of this equated factor loadings is the covariance between the 2 factors

Here, what does LAM stand for?

Thank you!