Dear folks,
I am trying to fit Bayesian Structural Equation Modeling (BSEM) models in Stan. I can easily do this thanks to Ed Merkle’s awesome package blavaan, but I hope to fit BSEM models with a data-augmentation approach that Song and Lee (2012) uses.
Song, X.-Y., & Lee, S.-Y. (2012). Basic and Advanced Bayesian Structural Equation Modeling: With Applications in the Medical and Behavioral Sciences. West Sussex, United Kindom: John Wiley & Son, Ltd.
I was lucky to find the previous issue (Non-convergence of latent variable model - #4 by Reg_John), and several posts by @Mauricio_Garnier-Villarre were quite helpful. Still, I would love to get some advice on how to program the model exactly.
My scenario is to set three explanatory latent variables (ksi) and one outcome latent variable (eta) where the former predicts the latter.
You can find the code I used to simulate data below:
# 1. Load packages
require(blavaan)
require(rstan)
options(mc.cores = parallel::detectCores())
rstan_options(auto_write = TRUE)
# 2. Target model
model.target <- '
# measurement equation
ksi1 =~ x1 + 0.5*x2 + 0.5*x3
ksi2 =~ x4 + 0.5*x5 + 0.5*x6
ksi3 =~ x7 + 0.5*x8 + 0.5*x9
eta =~ y1 + 0.5*y2 + 0.5*y3
# structural equation
eta ~ 0.5*ksi1 + 0.5*ksi2 + 0.5*ksi3
'
# 3. Simulate data
set.seed(322)
sim.dat.big <- simulateData(model.target, sample.nobs = 3000)
My goal is to have almost the same estimates as blavaan produces from the code below:
# 4. Specify a model in practice
model.practice <- '
# measurement equation
ksi1 =~ x1 + x2 + x3
ksi2 =~ x4 + x5 + x6
ksi3 =~ x7 + x8 + x9
eta =~ y1 + y2 + y3
# structural equation
eta ~ ksi1 + ksi2 + ksi3
'
# 5. Model fitting
set.seed(322) # reproducibility
blav.fit <- bsem(model.practice, data=sim.dat.big, n.chains=2)
summary(blav.fit)
So far, I have programmed the Stan model like below:
data{
int N; // sample size
int P; // number of variables
matrix[N, P] Y; // data matrix Y with N rows and P columns
int D; // number of total latent variables
int K; // number of explanatory latent variables (ksi)
int E; // number of outcome latent variables (eta)
}
parameters{
vector[P] nu; // intercepts of observed variables
vector[P-D] lam; // factor loadings
vector[K] beta; // structural regressions
matrix[N, K] FS_K; // factor scores of explanatory latent variables
vector[N] FS_E[E]; // factor scores of outcome latent variables
cov_matrix[D] cov_K; // covariance matrix among explanatory latent variables
vector<lower=0>[P] var_P; // variance for observed variables
matrix<lower=0>[K, K] var_K; // covariance for explanatory latent variables
vector<lower=0>[E] var_E; // variance for outcome latent variables
}
transformed parameters{
vector[N] mu_pred[P]; // predicted values of observed variables
vector[N] nu_pred; // predicted values of eta
for(i in 1:N){
mu_pred[i,1] = nu[1] + FS_K[i,1];
mu_pred[i,2] = nu[2] + lam[1]*FS_K[i,1];
mu_pred[i,3] = nu[3] + lam[2]*FS_K[i,1];
mu_pred[i,4] = nu[4] + FS_K[i,2];
mu_pred[i,5] = nu[5] + lam[3]*FS_K[i,2];
mu_pred[i,6] = nu[6] + lam[4]*FS_K[i,2];
mu_pred[i,7] = nu[7] + FS_K[i,3];
mu_pred[i,8] = nu[8] + lam[5]*FS_K[i,3];
mu_pred[i,9] = nu[9] + lam[6]*FS_K[i,3];
mu_pred[i,10] = nu[10] + FS_E[i];
mu_pred[i,11] = nu[11] + lam[7]*FS_E[i];
mu_pred[i,12] = nu[12] + lam[8]*FS_E[i];
}
for(i in 1:N){
nu_pred[i] = beta[1]*xi[i,1]+beta[2]*xi[i,2]+beta[3]*xi[i,3];
}
}
model{
// hyperpriors
R = diag_matrix(rep_vector(1, K))
vector[K] u;
u = rep_vector(0, K);
// priors on intercepts
for (i in 1:P) {nu[i] ~ normal(0, 10)};
// priors on factor loadings
for (i in 1:(P-D)) {lam[i] ~ normal(0, 10)};
// priors on regression coefficients
for (i in 1:K) {beta[i] ~ normal(0, 10)};
// priors on variance
for(i in 1:P) {var_P[i] ~ gamma(1, 0.5)};
var_E ~ gamma(1, 0.5);
var_K ~ wishart(R, 12);
// likelihood
for(i in 1:N){
for(j in 1:P){
Y[i, j] ~ normal(mu_pred[i,j],var_P[j]);
xi[i, 1:3] ~ multi_normal(u, var_K);
eta[i] ~ normal(nu_pred[i], var_E);
}
}
}
When I used this specification, the rstan gives me an error saying that there exists a dimension mismatch. However, I cannot figure out where it exists, so could anyone help me get the same results with this Stan code as those from the blavaan? Any additional feedback is appreciated.
Thank you a lot for reading this issue!