Draws from Confirmatory Factor Analysis with covariance for measured variables are all NaNs or constant values for some chains

Summary

I’m looking for advice on how to resolve some problems I encounter when fitting a CFA model with a correlation parameter for two of the measured variables. The sampling proceeds with lots of rejections of the form Exception: cholesky_decompose: Matrix m is not positive definite and some of the chains return only NaNs for some parameters and only constant values for others. Other chains seem to sample fine, despite also having lots of rejections.


Background

I’m trying to fit a classic CFA of the form

(\mathbf{Y} \sim \mathcal{N}(0, \mathbf{\Lambda} \mathbf{\Phi} \mathbf{\Lambda}' + \mathbf{\Theta}_{\delta})

But where one of the off-diagonals in \Theta_{\delta} is freely estimated, rather than being fixed at 0 in the traditional way. I think the jargon name for this type of model is ‘Correlated Uniqueness Model’.

I’m working closely from this code provided by @Mauricio_Garnier-Villarre to specify a CFA with correlated factors. My model only tweaks Mauricio’s code in a few ways, namely:

  • Only specify 2 latent factors, instead of 3;
  • Remove the intercept parameters b from the linear predictors of the measured variables;
  • Add a new parameter theta_m1_m4 for the covariance between measured variables m1 and m4;
  • Add some new logic in the model{} block to set up the likelihood for m1 and m4 separately from the other measured variables so that they can be explicitly modelled as drawn from a shared distribution.

I’m running the model on simulated data to test it out. There are many samples rejected over the course of the sampling, with a persistent message:

Exception: cholesky_decompose: Matrix m is not positive definite (in 'C:/Users/arand/AppData/Local/Temp/Rtmp8e6wlk/model-2d1c21761252.stan', line 131, column 2 to column 44)

Still, the density plots of the draws look generally OK for most of the parameters (the model ends up in the ballpark of the true parameter values, shown by the dashed lines). I would think apriori that the remaining weirdness in the plots could be overcome with more iterations and by fiddling with treedepth and adapt_delta, if not for the problems described below.

Problems

The main problem is that chains 2 and 4 return all NaN values for the parameters of interest other than theta_m1_m4, and for theta_m1_m4 they return a single constant value for all draws. So the density plots above only include draws from chains 1 and 3 for all of the variables except theta_m1_m4, and the constant values in chains 2 and 4 account for the weird spikes in the plot for that variable. Other generated quantities for chains 2 and 4 look normal.

It seems likely that the problem has to do with the way I’ve specified theta_m1_m4, because I’ve previously run an identical model but without that parameter and it sampled fine.

My Questions

  • Have I made some clear error in the way I’ve introduced theta_m1_m4 into the model?
  • Could I change my Stan code somehow (reparameterization, different priors, etc) such that I don’t get the Positive Definite sample rejections?
  • Would this resolve the issues I’m seeing with two of the chains having all NaN values for the factor loadings and a single fixed value for theta_m1_m4?

I’ve attached here the draws for the variables of interest, if that’s helpful for diagnostic purposes. This is my first real foray into raw Stan (I’ve only used brms in the past) so I’d be grateful for any advice, especially if there’s basic stuff / low hanging fruit I can learn from.

cursed_draws.csv (1.8 MB)

Data and Code

The Stan model:

data {
  int N; // sample size
  int P; // number of variables
  int D; // number of factors
  array[N] vector[P] X; // data matrix of order [N,P]
  int n_lam; // how many factor loadings are estimated
}
parameters {
  matrix[N, D] FS_UNC; // factor scores, matrix of order [N,D]
  cholesky_factor_corr[D] L_corr_d; // cholesky correlation between factors
  vector<lower=0>[P] sd_p; // residual sd for each variable
  real theta_m1_m4; // covariance between error terms of m1 and m4
  vector<lower=-30, upper=30>[n_lam] lam_UNC; // A bunch of lightly-constrained factor loadings
}
transformed parameters {
  // a vector to hold the factor means, which will be a bunch of 0s
  vector[D] M;
  M = rep_vector(0, D);
  
  // a vector to hold the factor SDs, which will be a bunch of 1s. 
  vector<lower=0>[D] Sd_d;
  Sd_d = rep_vector(1, D);
  
  // A vector to hold the linear predictors for the manifest variables, which are each a deterministic function of loading*factor_score
  array[N] vector[P] mu_UNC;
  for (i in 1 : N) {
    mu_UNC[i, 1] = lam_UNC[1] * FS_UNC[i, 1];
    mu_UNC[i, 2] = lam_UNC[2] * FS_UNC[i, 1];
    mu_UNC[i, 3] = lam_UNC[3] * FS_UNC[i, 1];
    mu_UNC[i, 4] = lam_UNC[4] * FS_UNC[i, 2];
    mu_UNC[i, 5] = lam_UNC[5] * FS_UNC[i, 2];
    mu_UNC[i, 6] = lam_UNC[6] * FS_UNC[i, 2];
  }
  
  // the var-covar matrix for the factors, a deterministic function of the deterministic SDs and the estimated rhos defined in the parameters block
  cholesky_factor_cov[D] L_Sigma;
  L_Sigma = diag_pre_multiply(Sd_d, L_corr_d);

}
model {
  // Declare some priors
  L_corr_d ~ lkj_corr_cholesky(1); // Prior on factor corrs
  lam_UNC ~ normal(0, 10); // Prior on loadings
  sd_p ~ cauchy(0, 2.5); // Prior on residual SDs of manifest variables
  theta_m1_m4 ~ cauchy(0, 2.5); // Prior on residual covariance of m1 and m4

  // Set up the likelihood for m1 and m4, which need special treatment because we want to estimate their covariance
  for (i in 1 : N) {
    vector[2] mu_sub;
    matrix[2, 2] sigma_sub;
    
    mu_sub[1] = mu_UNC[i, 1];
    mu_sub[2] = mu_UNC[i, 4];
    sigma_sub[1,1] = square(sd_p[1]);
    sigma_sub[2,2] = square(sd_p[4]);
    sigma_sub[1,2] = theta_m1_m4;
    sigma_sub[2,1] = theta_m1_m4;
    
    to_vector({X[i, 1], X[i, 4]}) ~ multi_normal_cholesky(mu_sub, sigma_sub);
  }

  // Set up the likelihoods of the other manifest variables and latent factors
  for (i in 1 : N) {
    for (j in {2,3,5,6}) {
      // Manifest variables
      X[i, j] ~ normal(mu_UNC[i, j], sd_p[j]);
    }
    // Latent factors
    FS_UNC[i] ~ multi_normal_cholesky(M, L_Sigma);
  }
}
generated quantities {
  array[N] real log_lik; // log likelihood of each datapoint
  real dev; // global deviance
  real log_lik0; // global log likelihood
  vector[N] log_lik_row;
  
  cov_matrix[P] Sigma; // Covariance matrix of the manifest variables
  cov_matrix[D] Phi_lv; // Covariance matrix of the latent factors
  matrix[P, P] lambda_phi_lambda; // I think these are the terms of the reconstructed var-covar matrix of the data?
  cholesky_factor_cov[P] L_Sigma_model; // I think this is the cholesky decomposition of the reconstructued var-covar matrix above?
  matrix[P, P] theta_del; // I think this is Matrix containing the error terms for the reconstructed empirical var-covar matrix?
  
  corr_matrix[D] Rho_UNC; /// correlation matrix
  corr_matrix[D] Rho; /// correlation matrix
  matrix[P, D] lam; // factor loadings
  matrix[N, D] FS; // factor scores, matrix of order [N,D]
  
  // Do some fancy things to sign-correct the parameter estimates.
  // The idea seems to be that when we estimate the loadings with unconstrained signs
  // It can lead to identification issues. So we take these steps to correct the signs.
  // See this Stan forum comment for more: https://discourse.mc-stan.org/t/non-convergence-of-latent-variable-model/12450/10?u=alex.b.r
  Rho_UNC = multiply_lower_tri_self_transpose(L_corr_d);
  Rho = Rho_UNC;
  FS = FS_UNC;
  lam = rep_matrix(0, P, D);
  lam[1 : 3, 1] = to_vector(lam_UNC[1 : 3]);
  lam[4 : 6, 2] = to_vector(lam_UNC[4 : 6]);
  
  // factor 1
  if (lam_UNC[1] < 0) {
    lam[1 : 3, 1] = to_vector(-1 * lam_UNC[1 : 3]);
    FS[ : , 1] = to_vector(-1 * FS_UNC[ : , 1]);
    
    if (lam_UNC[4] > 0) {
      Rho[1, 2] = -1 * Rho_UNC[1, 2];
      Rho[2, 1] = -1 * Rho_UNC[2, 1];
    }
  }
  // factor 2
  if (lam_UNC[4] < 0) {
    lam[4 : 6, 2] = to_vector(-1 * lam_UNC[4 : 6]);
    FS[ : , 2] = to_vector(-1 * FS_UNC[ : , 2]);
    
    if (lam_UNC[1] > 0) {
      Rho[2, 1] = -1 * Rho_UNC[2, 1];
      Rho[1, 2] = -1 * Rho_UNC[1, 2];
    }
  }
  
  /// marginal log-likelihood based on signed corrected parameters
  Phi_lv = quad_form_diag(Rho, Sd_d);
  lambda_phi_lambda = quad_form_sym(Phi_lv, transpose(lam));
  theta_del = diag_matrix(sd_p);
  
  Sigma = lambda_phi_lambda + theta_del;
  // Update sigma to reflect the presence of the new covariance parameters
  Sigma[1,4] = theta_m1_m4;
  Sigma[4,1] = theta_m1_m4;
  L_Sigma_model = cholesky_decompose(Sigma);
  
  for (i in 1 : N) {
    log_lik[i] = multi_normal_cholesky_lpdf(X[i] | rep_vector(0, P), L_Sigma_model);
  }
  
  log_lik0 = sum(log_lik); // global log-likelihood
  dev = -2 * log_lik0; // model deviance
}

Simulate data and fit the model with cmdstanr

# Declare the model for lavaan
mtmm.model <- ' 
  f1 =~ .8*m1 + .6*m2 + .9*m3
  f2 =~ .1*m4 + .2*m5 + -.4*m6
  f1 ~~ .4*f2
  m1 ~~ .4*m4
'

# Simulate data from the model
fake_dat <- lavaan::simulateData(model = mtmm.model, sample.nobs = 4000)

# Compile the model from the Stan file
model <- cmdstanr::cmdstan_model(stan_file)

# Put the data into a list format Stan can understand
X <- as.matrix(fake_dat)
P <- ncol(fake_dat) 
N <- nrow(fake_dat)
D <- 2

data_list <- list(N=N,P=P,D=D,X=X, n_lam=6)
param <- c("lam","Rho","sd_p","M","Sd_d", "theta_m1_m4")

# Fit the model
fit <- model$sample(
  data = data_list,
  seed = 199,
  chains = 4,
  parallel_chains = 4,
  iter_warmup = 1000,
  iter_sampling = 2000,
 # adapt_delta=0.99, 
 # max_treedepth = 12
)

I created the attached .csv file with the draws like so:

draws <- fit$draws(
  variables = c(
  "lam[1,1]",
  "lam[2,1]",
  "lam[3,1]",
  "lam[4,2]",
  "lam[5,2]",
  "lam[6,2]",
  "Rho[2,1]", 
  "theta_m1_m4"
  )
) |>

  gather_draws() |>

  write_csv("cursed_draws.csv")

Investigate the weirdness

```r
# The only non-NaN values in chains 2 and 4 are for theta_m1_m4, but they are constant within each chain
draws <- readRDS("fits/b07.03.samples.rds")

draws |>

  # Get the raw draws for the factor loadings and the beween-factor correlation coefficient
  gather_draws(
   `lam[1,1]`,
   `lam[2,1]`,
   `lam[3,1]`,
   `lam[4,2]`,
   `lam[5,2]`,
   `lam[6,2]`,
   `Rho[2,1]`,
    theta_m1_m4
  ) |>

  filter(.chain %in% c(2,4), .value != "NaN") 

For the bivariate normal on m1 and m4, it looks like you are passing the covariance matrix sigma_sub to multi_normal_cholesky(). But multi_normal_cholesky() requires the second argument to be the Cholesky factor of the covariance matrix instead of the covariance matrix itself. You might use multi_normal() instead of multi_normal_cholesky() there.

Alternatively, for better efficiency, you could define the Cholesky factor of that correlation matrix as a parameter, taking a similar approach to what is already done for L_corr_d.

Also, if haven’t already seen blavaan, you might take a look at it. You could use the lavaan syntax, and It might estimate this model with more speed/efficiency than the Stan model defined here.

1 Like

Thanks @edm, things went much smoother after fixing the mistake you identified. I also simplified the generated quantities block, which seems to have made a difference as well.

Sadly blavaan is not an option for this project because the model is to be expanded such that the factors are used as predictors in a survival analysis. I did have a look at the blavaan-generated Stan code for this model before posting my question here, but it was a bit complex for my level of Stan knowledge.

Here’s the new version of my model that samples well, for posterity:

data {
  int N; // sample size
  int P; // number of variables
  int D; // number of factors
  array[N] vector[P] X; // data matrix of order [N,P]
  int n_lam; // how many factor loadings are estimated
}
parameters {
  matrix[N, D] FS_UNC; // factor scores, matrix of order [N,D]
  cholesky_factor_corr[D] L_corr_d; // cholesky correlation between factors
  cholesky_factor_corr[2] L_corr_m1_m4; // cholesky correlation between the m1 and m4
  vector<lower=0>[P] sd_p; // residual sd for each variable
  vector<lower=-30, upper=30>[n_lam] lam_UNC; // A bunch of lightly-constrained factor loadings
}
transformed parameters {
  // a vector to hold the factor means, which will be a bunch of 0s
  vector[D] M;
  M = rep_vector(0, D);
  
  // a vector to hold the factor SDs, which will be a bunch of 1s. 
  vector<lower=0>[D] Sd_d;
  Sd_d = rep_vector(1, D);
  
  // A vector to hold the linear predictors for the manifest variables, which are each a deterministic function of loading*factor_score
  array[N] vector[P] mu_UNC;
  for (i in 1 : N) {
    mu_UNC[i, 1] = lam_UNC[1] * FS_UNC[i, 1];
    mu_UNC[i, 2] = lam_UNC[2] * FS_UNC[i, 1];
    mu_UNC[i, 3] = lam_UNC[3] * FS_UNC[i, 1];
    mu_UNC[i, 4] = lam_UNC[4] * FS_UNC[i, 2];
    mu_UNC[i, 5] = lam_UNC[5] * FS_UNC[i, 2];
    mu_UNC[i, 6] = lam_UNC[6] * FS_UNC[i, 2];
  }
  
  // the var-covar matrix for the factors, a deterministic function of the deterministic SDs and the estimated rhos defined in the parameters block
  cholesky_factor_cov[D] L_Sigma;
  L_Sigma = diag_pre_multiply(Sd_d, L_corr_d);

  // Same, but for the var-covar matrix of m1 and m4
  cholesky_factor_cov[D] L_Sigma_m1_m4;
  L_Sigma_m1_m4 = diag_pre_multiply(sd_p[{1, 4}], L_corr_m1_m4);
}
model {
  // Declare some priors
  L_corr_d ~ lkj_corr_cholesky(1); // Prior on factor corrs
  L_corr_m1_m4 ~ lkj_corr_cholesky(1); // Prior on corr between m1 and m4
  lam_UNC ~ normal(0, 10); // Prior on loadings
  sd_p ~ cauchy(0, 2.5); // Prior on residual SDs of manifest variables

  // Set up the joint log likelihood function
  for (i in 1 : N) {
    // The uncorrelated measured variables
    for (j in {2,3,5,6}) {
      // Manifest variables
      X[i, j] ~ normal(mu_UNC[i, j], sd_p[j]);
    }

    // m1 and m4, which get special treatment for being correlated
    to_vector({X[i, 1], X[i, 4]}) ~ multi_normal_cholesky([mu_UNC[i, 1], mu_UNC[i, 4]]', L_Sigma_m1_m4);

    // Latent factors
    FS_UNC[i] ~ multi_normal_cholesky(M, L_Sigma);
  }
}
generated quantities {
  corr_matrix[D] Rho_UNC; /// correlation matrix for factors
  corr_matrix[D] Rho_factors; /// correlation matrix for factors
  corr_matrix[D] Rho_m1_m4; /// correlation matrix for m1 and m4
  matrix[P, D] lam; // factor loadings
  matrix[N, D] FS; // factor scores, matrix of order [N,D]
  
  // Do some fancy things to sign-correct the parameter estimates.
  // The idea seems to be that when we estimate the loadings with unconstrained signs
  // It can lead to identification issues. So we take these steps to correct the signs.
  // See this Stan forum comment for more: https://discourse.mc-stan.org/t/non-convergence-of-latent-variable-model/12450/10
  Rho_UNC = multiply_lower_tri_self_transpose(L_corr_d);
  Rho_factors = Rho_UNC;
  Rho_m1_m4 = multiply_lower_tri_self_transpose(L_corr_m1_m4);
  FS = FS_UNC;
  lam = rep_matrix(0, P, D);
  lam[1 : 3, 1] = to_vector(lam_UNC[1 : 3]);
  lam[4 : 6, 2] = to_vector(lam_UNC[4 : 6]);
  
  // factor 1
  if (lam_UNC[1] < 0) {
    lam[1 : 3, 1] = to_vector(-1 * lam_UNC[1 : 3]);
    FS[ : , 1] = to_vector(-1 * FS_UNC[ : , 1]);
    
    if (lam_UNC[4] > 0) {
      Rho_factors[1, 2] = -1 * Rho_UNC[1, 2];
      Rho_factors[2, 1] = -1 * Rho_UNC[2, 1];
    }
  }
  // factor 2
  if (lam_UNC[4] < 0) {
    lam[4 : 6, 2] = to_vector(-1 * lam_UNC[4 : 6]);
    FS[ : , 2] = to_vector(-1 * FS_UNC[ : , 2]);
    
    if (lam_UNC[1] > 0) {
      Rho_factors[2, 1] = -1 * Rho_UNC[2, 1];
      Rho_factors[1, 2] = -1 * Rho_UNC[1, 2];
    }
  }
}
1 Like