Fitting of a Bayes CFA Model in stan

Hey everyone,

I am trying to build my first Bayes Model for a Bayes confirmatory factor analysis. I am working with rstan. I am not quite sure if I did it right. My model does converge, but i have some problems with long running times (many hours), so I wanted to ask for hints here.

Here is my model:

mod_4 <- "
data {
  int<lower=0> N;              // Number of observations
  int<lower=0> K;              // Number of observed variables
  int<lower=0> F;              // Number of factors
  matrix[N, K] Y;              // Matrix of observed data
  int<lower=1,upper=F> factor_map[K]; // Factor mapping of variables
}

parameters {
  vector<lower=0, upper=1>[K] lambda;              // Factor loadings
  cholesky_factor_corr[F] L_Omega; // Cholesky-dissembled correlation matrix of factors
  vector<lower=0>[F] tau;        // Scale parameters of the latent factors
  matrix[N, F] eta_tilde;        // Uncorrelated latent variables
  vector<lower=0, upper=1>[K] sigma;        // Residual variances
}

transformed parameters {
  matrix[N, F] eta;                // Latente Variablen mit Kovarianzstruktur
  matrix[F, F] Omega;              // Korrelationsmatrix der Faktoren

  Omega = multiply_lower_tri_self_transpose(L_Omega); // Berechnung von Omega aus L_Omega
  eta = eta_tilde * diag_pre_multiply(tau, L_Omega);  // Skalierte und korrelierte latente Faktoren
}

model {
  // Informative Priors for factor loadings
  lambda[1:3] ~ normal(0.84, 0.05); # Positive Emotions ## Changing Standard deviation due to too narrow priors
  lambda[4:6] ~ normal(0.65, 0.200); # Engagement 
  lambda[7:9] ~ normal(0.78, 0.107); # Relationships
  lambda[10:12] ~ normal(0.87, 0.05); # Meaning ## Changing Standard deviation due to too narrow priors
  lambda[13:15] ~ normal(0.75, 0.151); # Accomplishment
  
  // Uninformative priors for sigma 
  sigma ~ beta(2,3);

  // LKJ-Prior fĂĽr die Korrelationsmatrix
  L_Omega ~ lkj_corr_cholesky(1);   

  // Priors fĂĽr die Skalenparameter der latenten Faktoren
  tau ~ normal(0, 1);;

  // Latent variables as standardized normal distributed
  to_vector(eta_tilde) ~ normal(0, 1);

  // Likelihood der beobachteten Variablen
  for (k in 1:K) {
    Y[, k] ~ normal(lambda[k] * eta[, factor_map[k]], sigma[k]);
  }
}
"

Further, i have to run the code with a lot of iterations, I run the model with the following arguments:

fit_bayes_4 <- stan(
  model_code = mod_4,
  data = stan_data,
  iter = 22000,
  chains = 4,
  # verbose = TRUE,
  control = list(adapt_delta = 0.9995, max_treedepth = 20)
)

I am really not sure if I did it right and since I have any comparison because this is my first bayes model, I am uncertain if this is normal (like the long running times and lots of iterations to get acceptable rhat and n_eff).

I am really helpfull if someone could give me their thoughts about my model!

Thanks in advance!

Welcome to the Stan Discourse! I hope you find the community here helpful.

I believe you have an error in your model. EDIT Sorry, I’m wrong. I suspect that my misunderstanding of factor_map is key, and means my first suggestion below is incorrect. If I had to guess, each variable is mapped to only a single factor, which is why there are only K loadings. Also, the suggestion below is more like an EFA, not CFA, so that is my mistake. Either way, the sufficient statistics approach would be more efficient.

You have K variables and F factors. So your loadings should be a F \times K matrix rather than a vector of length K.

I don’t know what factor_map is doing in this case. But here is the matrix form when taking the approach above.

transformed parameters{
    matrix[N,F] eta;
    matrix[F,K] lambda;
    // ...
}
model{
    for(n in 1:N){
        Y[n,] ~ normal(eta[n,] * lambda, sigma);
    }

    // alternatively...
    to_vector(Y) ~ normal(to_vector(eta * lambda), to_vector(rep_matrix(sigma', N));
}

An even more efficient approach would be to use a sufficient-statistics estimator. Rather than modelling the raw data, you model the scale matrix (covariance matrix * (N-1)). This is MUCH more efficient because it doesn’t require you to estimate the latent variables. However, it only works for complete data under the multivariate normal assumption.

I agree that the sufficient statistic approach can be faster, especially for large N.

Another thing to consider is marginalizing over eta. Instead of Y[, k] ~ normal(), you would have something like

psi = quad_form_sym(Omega, diag_matrix(tau));
for (i in 1:N) {
  Y[i, ] ~ multi_normal(rep_vector(0, K), quad_form_sym(psi, lambda'));
}

You might also look at the blavaan R package, which contains many of these optimizations and uses lavaan syntax for model specification. Here is an old post linking to some other related information:

1 Like

You chose a challenging problem on which to start. Most people start with some simple regressions to get a feel for how everything works.

Why do you have to run with a lot of iterations? I’m guessing from your adapt_delta and max_treedpeth that you were getting divergence warnings and tried to fix them based on the warning the code gave you? I’m afraid that almost never works. Instead, it points out that the main problem your model is having is posterior geometry. The only way to really fix this is to reparameterize.

Why are you upper-bounding the scale at 1?

vector<lower=0, upper=1>[K] sigma;

What doess the fit for sigma look like? If it’s bunching up a lot of probability mass near 1, that could be one source of computational issues.

I think people usually mean beta(1, 1) (uniform) or beta(0.5, 0.5) (Jeffries) when they want “uninformative” priors on parameter. The prior you use will gently regularize toward 2/5 with the strength of three prior observations.

Are your posteriors consistent with the normal priors you took? A more efficient way to code all this and also supply a non-centered parameterization would be to take

transformed data {
  vector[15] mu_lambda;
  mu_lambda[1:3] = rep_vector(0.84, 3); 
  ...
  vector[15] sigma_lambda;
  ...
parameters {
  vector[15] lambda_std;

model {
  vector[15] lambda = mu_lambda + sigma_lambda .* lambda;
  lambda_std ~ normal(0, 1);

It might help with computation to break those lambda variables into groups and give them offset and multiplier values, e.g.,

vector<offset=0.78, multiplier=0.107>[3] lambda7_9;
...
lambda_7_9 ~ normal(

I’m afraid I don’t understand what the factor_map is doing here to marginalize.

It’s more efficient to use lambda, eta, and factor_mapin a matrix multiply that you then index. As is, it creates a new column vector out ofeta` each iteration which is inefficient.

Why is sigma not also a size F variable with sigma[factor_map[k]] rather than sigma[k]? Giving each observation its own variance can be problematic, but you’re going to have multiple data points in Y[ , k] to fit it.

Also, breaking down Y[, k] like this is inefficient. It’s better if Y is declared as a transposed array of vectors so that you can use Y[k] without a copy rather than Y[ , k], which creates and populates a new vector from the k-th row of Y.

I’m not sure why you’re creating Omega—it’s not used anywhere. If you want to save it for output, put it in generated_quantities. But I wouldn’t even do that—you can just compute it on the outside and storing it will take a lot of disk space. But definitely don’t leave it as is, as that also kicks off a ton of autodiff that doesn’t go anywhere.

But what I’m really unclear about is why you set up a multivariate prior and then only use marginalizations of it. That’s probably wasteful. Did you really want to give some quantity a multivariate prior? Is there a self-contained math definition somewhere of the model you’re trying to fit? If not, doing that first and posting it here will make it a lot easier to help.

I should have also said that to discover where you need to parameterize, you need to look at posteriors of pairs of parameters. If you find funnel-like shapes or banana-like shapes, you’ll see where the sampler is going to struggle. We can help more if you do some of the legwork on this to see where things are failing and to help with understanding the intention of the model.

Per @Bob_Carpenter 's comment about starting with simpler models, I would start with a model that only includes one factor and try to address any issues. Then, add the second factor and solve any issues that arise. Eventually you will arrive back at your original model, though with hopefully fewer issues.

So start with a model of just the Positive Emotion items, then add in the Engagement items, etc…