# Bayesian Confirmatory Factor Analysis with known latent covariance

Hello, I am trying to learn Bayesian methods using RStan to tweak examples I have seen online.

In particular I am trying to recreate this example of Bayesian CFA, except I want to encode the fact I do have some existing knowledge about the off-diagonal elements of the correlation structure of the latent factors.

My current challenge seems pretty straightforward: I have tweaked the CFA model above to have a latent covariance structure instead (so I can draw this from an Inverse Weibull and encode my knowledge), and fixed a single factor loading of each factor to 1.

However, for some reason my code doesn’t run. It compiles fine but once I hit run It stays stuck, doesn’t produce an error either. Not sure where I went wrong here! My Stan and R code (should be reproducible, I’ve removed the Wishart sampling for simplicity and kept the “known” latent covariance matrix fixed and even this doesn’t work):

``````//confirmatory factor analysis code
data{
// N: number of subjects
int N ;
// k: number of items
int k ;
// y: matrix of outcomes
matrix[N,k] y ;
// d: number of latent factors
int d ;
// y_fac: list of which factor is associated with each outcome
int<lower=1,upper=d> y_fac[k] ;
// Prior matrix for latent covariance
matrix[d,d] Sigma_z ;
}

transformed data{

// scaled_y
matrix[N,k] scaled_y ;
vector[d] zeros;

for(i in 1:k){
scaled_y[,i] = (y[,i]-mean(y[,i])) / sd(y[,i]) ;
}

zeros = rep_vector(0, 3);

}

parameters{
// latent variables
matrix[N,d] Z ;
// scaled_y_means: means for each outcome
vector[k] scaled_y_means ;
// scaled_y_noise: measurement noise for each outcome
vector<lower=0>[k] scaled_y_noise ;
// betas: how each factor loads on to each outcome.
//  Must be postive for identifiability.
vector<lower=0>[k] betas ;
}

transformed parameters{
vector<lower=0>[k] fixed_betas;

fixed_betas = betas;
fixed_betas = 1;

for (i in 2:k){
if (y_fac[i] == y_fac[i-1])
fixed_betas[i] = betas[i];
else
fixed_betas[i] = 1;
}
}

model{

// draw latent factors
for(i in 1:N){
Z[i,] ~ multi_normal(zeros, Sigma_z) ;
}
// normal prior on y means
scaled_y_means ~ normal(0,1) ;
// weibull prior on measurement noise
scaled_y_noise ~ weibull(2,1) ;
// normal(0,1) priors on betas
betas ~ normal(0,1) ;
// each outcome as normal, influenced by its associated latent factor
for(i in 1:N){
for(j in 1:k){
scaled_y[,j] ~ normal(
scaled_y_means[j] + Z[i,y_fac[j]] * fixed_betas[j],
scaled_y_noise[j]);
}
}
}

generated quantities{
// y_means: outcome means on the original scale of Y
vector[k] y_means ;
// y_noise: outcome noise on the original scale of Y
vector[k] y_noise ;
for(i in 1:k){
y_means[i] = scaled_y_means[i]*sd(y[,i]) + mean(y[,i]) ;
y_noise[i] = scaled_y_noise[i]*sd(y[,i]) ;
}
}
``````
``````# In R:
# Load necessary libraries and set up multi-core processing for Stan
set.seed(123)
options(warn=-1, message =-1)
library(dplyr); library(ggplot2); library(rstan); library(reshape2); require(lavaan)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())

stan_cfa_model <- stan_model("stan_cfa.stan") # refer to Stan code above

HolzingerSwineford1939 %>%
dplyr::select(
x1,x2,x3,x4,x5,x6,x7,x8,x9
) -> dat

stan_dat <- list(
# n_y: number of outcomes
N = nrow(dat)
# n_subj: number of subjects
, k = ncol(dat)
# y: matrix of outcomes
, y = as.matrix(dat)
# n_fac: number of latent factors
, d = 3
# y_fac: list of which factor is associated with each outcome
, y_fac = c(1,1,1,2,2,2,3,3,3)
# ''known'' latent covariance matrix
, Sigma_z = matrix(c(1.0,0.5,0.0,
0.5,1.0,0.5,
0.0,0.5,1.0),3,3)
)

stan_fit <- sampling(stan_cfa_model, stan_dat,
cores = 3, chains = 3, iter = 4000,
refresh = 0)
``````

You need and `i` as in `scaled_y[i, j] ~ normal( ...`

Even more importantly, I think you only want a size d vector for `Z` (ie `vector[d] Z`) since you index by `y_fac[j]` for all `i in N` in the model.

I modified it using cholesky factors. Not sure if this will work well for your expanded model or not. It runs pretty quick.

``````//confirmatory factor analysis code
data{
// N: number of subjects
int N ;
// k: number of items
int k ;
// y: matrix of outcomes
matrix[N,k] y ;
// d: number of latent factors
int d ;
// y_fac: list of which factor is associated with each outcome
int<lower=1,upper=d> y_fac[k] ;
// Prior matrix for latent covariance
matrix[d, d] Sigma_z ;
}

transformed data{
// scaled_y
matrix[N, k] scaled_y ;
vector[d] zeros = rep_vector(0, d);
cholesky_factor_corr[d] L_z = cholesky_decompose(Sigma_z);

for(i in 1:k){
scaled_y[,i] = (y[,i]-mean(y[,i])) / sd(y[,i]) ;
}
}
parameters{
// latent variables
vector[d] Z ;
// scaled_y_means: means for each outcome
vector[k] scaled_y_means ;
// scaled_y_noise: measurement noise for each outcome
vector<lower=0>[k] scaled_y_noise ;
// betas: how each factor loads on to each outcome.
//  Must be postive for identifiability.
vector<lower=0>[k] betas ;
}
transformed parameters{
vector<lower=0>[k] fixed_betas;

fixed_betas = betas;
fixed_betas = 1;

for (i in 2:k){
if (y_fac[i] == y_fac[i-1])
fixed_betas[i] = betas[i];
else
fixed_betas[i] = 1;
}
}

model{
// draw latent factors
Z ~ multi_normal_cholesky(zeros, L_z) ;
// normal prior on y means
scaled_y_means ~ normal(0,1) ;
// weibull prior on measurement noise
scaled_y_noise ~ weibull(2,1) ;
// normal(0,1) priors on betas
betas ~ normal(0,1) ;
// each outcome as normal, influenced by its associated latent factor
for (j in 1:k)
scaled_y[ , j] ~ normal(
scaled_y_means[j] + Z[y_fac[j]] * fixed_betas[j],
scaled_y_noise[j]);
}
generated quantities{
// y_means: outcome means on the original scale of Y
vector[k] y_means ;
// y_noise: outcome noise on the original scale of Y
vector[k] y_noise ;
for(i in 1:k){
y_means[i] = scaled_y_means[i]*sd(y[,i]) + mean(y[,i]) ;
y_noise[i] = scaled_y_noise[i]*sd(y[,i]) ;
}
}
``````