Summary
I’m looking for advice on how to resolve some problems I encounter when fitting a CFA model with a correlation parameter for two of the measured variables. The sampling proceeds with lots of rejections of the form Exception: cholesky_decompose: Matrix m is not positive definite
and some of the chains return only NaNs for some parameters and only constant values for others. Other chains seem to sample fine, despite also having lots of rejections.
Background
I’m trying to fit a classic CFA of the form
(\mathbf{Y} \sim \mathcal{N}(0, \mathbf{\Lambda} \mathbf{\Phi} \mathbf{\Lambda}' + \mathbf{\Theta}_{\delta})
But where one of the off-diagonals in \Theta_{\delta} is freely estimated, rather than being fixed at 0 in the traditional way. I think the jargon name for this type of model is ‘Correlated Uniqueness Model’.
I’m working closely from this code provided by @Mauricio_Garnier-Villarre to specify a CFA with correlated factors. My model only tweaks Mauricio’s code in a few ways, namely:
- Only specify 2 latent factors, instead of 3;
- Remove the intercept parameters
b
from the linear predictors of the measured variables; - Add a new parameter
theta_m1_m4
for the covariance between measured variablesm1
andm4
; - Add some new logic in the
model{}
block to set up the likelihood form1
andm4
separately from the other measured variables so that they can be explicitly modelled as drawn from a shared distribution.
I’m running the model on simulated data to test it out. There are many samples rejected over the course of the sampling, with a persistent message:
Exception: cholesky_decompose: Matrix m is not positive definite (in 'C:/Users/arand/AppData/Local/Temp/Rtmp8e6wlk/model-2d1c21761252.stan', line 131, column 2 to column 44)
Still, the density plots of the draws look generally OK for most of the parameters (the model ends up in the ballpark of the true parameter values, shown by the dashed lines). I would think apriori that the remaining weirdness in the plots could be overcome with more iterations and by fiddling with treedepth and adapt_delta, if not for the problems described below.
Problems
The main problem is that chains 2 and 4 return all NaN
values for the parameters of interest other than theta_m1_m4
, and for theta_m1_m4
they return a single constant value for all draws. So the density plots above only include draws from chains 1 and 3 for all of the variables except theta_m1_m4
, and the constant values in chains 2 and 4 account for the weird spikes in the plot for that variable. Other generated quantities for chains 2 and 4 look normal.
It seems likely that the problem has to do with the way I’ve specified theta_m1_m4
, because I’ve previously run an identical model but without that parameter and it sampled fine.
My Questions
- Have I made some clear error in the way I’ve introduced
theta_m1_m4
into the model? - Could I change my Stan code somehow (reparameterization, different priors, etc) such that I don’t get the Positive Definite sample rejections?
- Would this resolve the issues I’m seeing with two of the chains having all
NaN
values for the factor loadings and a single fixed value fortheta_m1_m4
?
I’ve attached here the draws for the variables of interest, if that’s helpful for diagnostic purposes. This is my first real foray into raw Stan (I’ve only used brms in the past) so I’d be grateful for any advice, especially if there’s basic stuff / low hanging fruit I can learn from.
cursed_draws.csv (1.8 MB)
Data and Code
The Stan model:
data {
int N; // sample size
int P; // number of variables
int D; // number of factors
array[N] vector[P] X; // data matrix of order [N,P]
int n_lam; // how many factor loadings are estimated
}
parameters {
matrix[N, D] FS_UNC; // factor scores, matrix of order [N,D]
cholesky_factor_corr[D] L_corr_d; // cholesky correlation between factors
vector<lower=0>[P] sd_p; // residual sd for each variable
real theta_m1_m4; // covariance between error terms of m1 and m4
vector<lower=-30, upper=30>[n_lam] lam_UNC; // A bunch of lightly-constrained factor loadings
}
transformed parameters {
// a vector to hold the factor means, which will be a bunch of 0s
vector[D] M;
M = rep_vector(0, D);
// a vector to hold the factor SDs, which will be a bunch of 1s.
vector<lower=0>[D] Sd_d;
Sd_d = rep_vector(1, D);
// A vector to hold the linear predictors for the manifest variables, which are each a deterministic function of loading*factor_score
array[N] vector[P] mu_UNC;
for (i in 1 : N) {
mu_UNC[i, 1] = lam_UNC[1] * FS_UNC[i, 1];
mu_UNC[i, 2] = lam_UNC[2] * FS_UNC[i, 1];
mu_UNC[i, 3] = lam_UNC[3] * FS_UNC[i, 1];
mu_UNC[i, 4] = lam_UNC[4] * FS_UNC[i, 2];
mu_UNC[i, 5] = lam_UNC[5] * FS_UNC[i, 2];
mu_UNC[i, 6] = lam_UNC[6] * FS_UNC[i, 2];
}
// the var-covar matrix for the factors, a deterministic function of the deterministic SDs and the estimated rhos defined in the parameters block
cholesky_factor_cov[D] L_Sigma;
L_Sigma = diag_pre_multiply(Sd_d, L_corr_d);
}
model {
// Declare some priors
L_corr_d ~ lkj_corr_cholesky(1); // Prior on factor corrs
lam_UNC ~ normal(0, 10); // Prior on loadings
sd_p ~ cauchy(0, 2.5); // Prior on residual SDs of manifest variables
theta_m1_m4 ~ cauchy(0, 2.5); // Prior on residual covariance of m1 and m4
// Set up the likelihood for m1 and m4, which need special treatment because we want to estimate their covariance
for (i in 1 : N) {
vector[2] mu_sub;
matrix[2, 2] sigma_sub;
mu_sub[1] = mu_UNC[i, 1];
mu_sub[2] = mu_UNC[i, 4];
sigma_sub[1,1] = square(sd_p[1]);
sigma_sub[2,2] = square(sd_p[4]);
sigma_sub[1,2] = theta_m1_m4;
sigma_sub[2,1] = theta_m1_m4;
to_vector({X[i, 1], X[i, 4]}) ~ multi_normal_cholesky(mu_sub, sigma_sub);
}
// Set up the likelihoods of the other manifest variables and latent factors
for (i in 1 : N) {
for (j in {2,3,5,6}) {
// Manifest variables
X[i, j] ~ normal(mu_UNC[i, j], sd_p[j]);
}
// Latent factors
FS_UNC[i] ~ multi_normal_cholesky(M, L_Sigma);
}
}
generated quantities {
array[N] real log_lik; // log likelihood of each datapoint
real dev; // global deviance
real log_lik0; // global log likelihood
vector[N] log_lik_row;
cov_matrix[P] Sigma; // Covariance matrix of the manifest variables
cov_matrix[D] Phi_lv; // Covariance matrix of the latent factors
matrix[P, P] lambda_phi_lambda; // I think these are the terms of the reconstructed var-covar matrix of the data?
cholesky_factor_cov[P] L_Sigma_model; // I think this is the cholesky decomposition of the reconstructued var-covar matrix above?
matrix[P, P] theta_del; // I think this is Matrix containing the error terms for the reconstructed empirical var-covar matrix?
corr_matrix[D] Rho_UNC; /// correlation matrix
corr_matrix[D] Rho; /// correlation matrix
matrix[P, D] lam; // factor loadings
matrix[N, D] FS; // factor scores, matrix of order [N,D]
// Do some fancy things to sign-correct the parameter estimates.
// The idea seems to be that when we estimate the loadings with unconstrained signs
// It can lead to identification issues. So we take these steps to correct the signs.
// See this Stan forum comment for more: https://discourse.mc-stan.org/t/non-convergence-of-latent-variable-model/12450/10?u=alex.b.r
Rho_UNC = multiply_lower_tri_self_transpose(L_corr_d);
Rho = Rho_UNC;
FS = FS_UNC;
lam = rep_matrix(0, P, D);
lam[1 : 3, 1] = to_vector(lam_UNC[1 : 3]);
lam[4 : 6, 2] = to_vector(lam_UNC[4 : 6]);
// factor 1
if (lam_UNC[1] < 0) {
lam[1 : 3, 1] = to_vector(-1 * lam_UNC[1 : 3]);
FS[ : , 1] = to_vector(-1 * FS_UNC[ : , 1]);
if (lam_UNC[4] > 0) {
Rho[1, 2] = -1 * Rho_UNC[1, 2];
Rho[2, 1] = -1 * Rho_UNC[2, 1];
}
}
// factor 2
if (lam_UNC[4] < 0) {
lam[4 : 6, 2] = to_vector(-1 * lam_UNC[4 : 6]);
FS[ : , 2] = to_vector(-1 * FS_UNC[ : , 2]);
if (lam_UNC[1] > 0) {
Rho[2, 1] = -1 * Rho_UNC[2, 1];
Rho[1, 2] = -1 * Rho_UNC[1, 2];
}
}
/// marginal log-likelihood based on signed corrected parameters
Phi_lv = quad_form_diag(Rho, Sd_d);
lambda_phi_lambda = quad_form_sym(Phi_lv, transpose(lam));
theta_del = diag_matrix(sd_p);
Sigma = lambda_phi_lambda + theta_del;
// Update sigma to reflect the presence of the new covariance parameters
Sigma[1,4] = theta_m1_m4;
Sigma[4,1] = theta_m1_m4;
L_Sigma_model = cholesky_decompose(Sigma);
for (i in 1 : N) {
log_lik[i] = multi_normal_cholesky_lpdf(X[i] | rep_vector(0, P), L_Sigma_model);
}
log_lik0 = sum(log_lik); // global log-likelihood
dev = -2 * log_lik0; // model deviance
}
Simulate data and fit the model with cmdstanr
# Declare the model for lavaan
mtmm.model <- '
f1 =~ .8*m1 + .6*m2 + .9*m3
f2 =~ .1*m4 + .2*m5 + -.4*m6
f1 ~~ .4*f2
m1 ~~ .4*m4
'
# Simulate data from the model
fake_dat <- lavaan::simulateData(model = mtmm.model, sample.nobs = 4000)
# Compile the model from the Stan file
model <- cmdstanr::cmdstan_model(stan_file)
# Put the data into a list format Stan can understand
X <- as.matrix(fake_dat)
P <- ncol(fake_dat)
N <- nrow(fake_dat)
D <- 2
data_list <- list(N=N,P=P,D=D,X=X, n_lam=6)
param <- c("lam","Rho","sd_p","M","Sd_d", "theta_m1_m4")
# Fit the model
fit <- model$sample(
data = data_list,
seed = 199,
chains = 4,
parallel_chains = 4,
iter_warmup = 1000,
iter_sampling = 2000,
# adapt_delta=0.99,
# max_treedepth = 12
)
I created the attached .csv file with the draws like so:
draws <- fit$draws(
variables = c(
"lam[1,1]",
"lam[2,1]",
"lam[3,1]",
"lam[4,2]",
"lam[5,2]",
"lam[6,2]",
"Rho[2,1]",
"theta_m1_m4"
)
) |>
gather_draws() |>
write_csv("cursed_draws.csv")
Investigate the weirdness
```r
# The only non-NaN values in chains 2 and 4 are for theta_m1_m4, but they are constant within each chain
draws <- readRDS("fits/b07.03.samples.rds")
draws |>
# Get the raw draws for the factor loadings and the beween-factor correlation coefficient
gather_draws(
`lam[1,1]`,
`lam[2,1]`,
`lam[3,1]`,
`lam[4,2]`,
`lam[5,2]`,
`lam[6,2]`,
`Rho[2,1]`,
theta_m1_m4
) |>
filter(.chain %in% c(2,4), .value != "NaN")