Non-convergence of latent variable model

A simple solution is to set the latent variable variances to 1. Sample the loadings unconstrained and create a copy of the loadings in the transformed parameters block. Select an item per factor as a marker item. Check factor by factor, if the marker item loading is negative, then multiply all the loadings on that factor by -1.

You will also have to make changes to the interfactor correlation matrix. So create a copy of this matrix in the transformed parameters. And also multiply the relevant correlations by -1. So if the loadings on factor 2 were multiplied by -1, then all the correlations to factor 2 should also be multiplied by -1. Remember to multiply row 2 and column 2 of the correlation matrix.

You’d have to do all this before creating lambda_phi_lambda. blavaan now does something like this.

2 Likes

Just to clarify, blavaan does this in the generated quantities block, instead of the transformed block. Right now blavaan has two different Stan models, one based on creating factor scores based on data augmentation (target=“stanclassic”), and the default precompiled now which does not estimate factor scores and works with the covariance matrix equations, based on LISREL parameterization (target=“stan”). In both method the parameters are signed corrected in the generated quantities block

You can export the Stan code for either method and see how this works on blavaan with the mcmcfile argument

HS.model <- ’ visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9 ’

fit <- bcfa(HS.model, data = HolzingerSwineford1939, std.lv=T, mcmcfile =“default”, target=“stan”)
fit2 <- bcfa(HS.model, data = HolzingerSwineford1939, std.lv=T, mcmcfile =“augmentation”, target=“stanclassic”)

OP can make these changes in either location in their current code, which follows the LISREL measurement equation.

I would not expect OP to have much use for the exported blavaan Stan code as it is quite bulky, and difficult to follow in certain parts.

Agree, the blavaan exported Stan code, is bulky and can be difficult to follow. I still like to recommend these options, as its a good way to learn coding tricks by seeing code that works

As of the model, I worked and example model similar to the original model posted, with the corrected signed parameters in the generated quantities block. Some comments about the model, I estimated the model based on the data augmentation approach, which for me at least make it easier to extend for Multilevel CFA. But the log-likelihood for model comparison is estimated with the covariance matrix approach, so the marginal log-likelihood is used instead of the conditional, as recommended by

Merkle, E. C., Furr, D., & Rabe-Hesketh, S. (2019). Bayesian model assessment: Use of conditional vs marginal likelihoods. Psychometrika , 84 , 802–829.

data{
int N; // sample size
int P; // number of variables
int D; // number of dimensions
vector[P] X[N]; // data matrix of order [N,P]
int n_lam; // how many factor loadings are estimated
}

parameters{
vector[P] b; // intercepts
matrix[N,D] FS_UNC; // factor scores, matrix of order [N,D]
cholesky_factor_corr[D] L_corr_d; // cholesky correlation between factors
vector<lower=0>[P] sd_p; // residual sd for each variable
vector<lower=-30, upper=30>[n_lam] lam_UNC; // not positivily constraint 
}

transformed parameters{
vector[D] M; // factor means
vector<lower=0>[D] Sd_d; // sd for each factor
vector[P] mu_UNC[N]; // predicted values
cholesky_factor_cov[D] L_Sigma;

M = rep_vector(0, D);
Sd_d = rep_vector(1, D);

L_Sigma = diag_pre_multiply(Sd_d, L_corr_d);


for(i in 1:N){
mu_UNC[i,1] = b[1] + lam_UNC[1]*FS_UNC[i,1];
mu_UNC[i,2] = b[2] + lam_UNC[2]*FS_UNC[i,1];
mu_UNC[i,3] = b[3] + lam_UNC[3]*FS_UNC[i,1];
mu_UNC[i,4] = b[4] + lam_UNC[4]*FS_UNC[i,2];
mu_UNC[i,5] = b[5] + lam_UNC[5]*FS_UNC[i,2];
mu_UNC[i,6] = b[6] + lam_UNC[6]*FS_UNC[i,2];
mu_UNC[i,7] = b[7] + lam_UNC[7]*FS_UNC[i,3];
mu_UNC[i,8] = b[8] + lam_UNC[8]*FS_UNC[i,3];
mu_UNC[i,9] = b[9] + lam_UNC[9]*FS_UNC[i,3];
}

}

model{

L_corr_d ~ lkj_corr_cholesky(1);
b ~ normal(0, 10);
lam_UNC ~ normal(0, 10);
sd_p ~ cauchy(0,2.5);
///Sd_d ~ cauchy(0,2.5);

for(i in 1:N){
for(j in 1:P){
X[i,j] ~ normal(mu_UNC[i,j], sd_p[j]);
}
FS_UNC[i] ~ multi_normal_cholesky(M, L_Sigma);
}

}

generated quantities{
real log_lik[N]; ///// log likelihood matrix
real dev; /////// global deviance
real log_lik0; ///// global log likelihood
vector[N] log_lik_row;

cov_matrix[P] Sigma;
cov_matrix[D] Phi_lv;
matrix[P,P] lambda_phi_lambda;
cholesky_factor_cov[P] L_Sigma_model;
matrix[P,P] theta_del;

corr_matrix[D] Rho_UNC; /// correlation matrix
corr_matrix[D] Rho; /// correlation matrix
matrix[P,D] lam;  // factor loadings
matrix[N,D] FS; // factor scores, matrix of order [N,D]
///vector[P] mu[N]; // predicted values

/// signed corrected parameters
Rho_UNC = multiply_lower_tri_self_transpose(L_corr_d);
Rho = Rho_UNC;
FS = FS_UNC;
lam = rep_matrix(0, P, D);
lam[1:3,1] = to_vector(lam_UNC[1:3]);
lam[4:6,2] = to_vector(lam_UNC[4:6]);
lam[7:9,3] = to_vector(lam_UNC[7:9]);

// factor 1
if(lam_UNC[1] < 0){
  lam[1:3,1] = to_vector(-1*lam_UNC[1:3]);
  FS[,1] = to_vector(-1*FS_UNC[,1]);
  
  if(lam_UNC[4] > 0){
  Rho[1,2] = -1*Rho_UNC[1,2];
  Rho[2,1] = -1*Rho_UNC[2,1];
  }
  if(lam_UNC[7] > 0){
  Rho[1,3] = -1*Rho_UNC[1,3];
  Rho[3,1] = -1*Rho_UNC[3,1];
  }
}
// factor 2
if(lam_UNC[4] < 0){
  lam[4:6,2] = to_vector(-1*lam_UNC[4:6]);
  FS[,2] = to_vector(-1*FS_UNC[,2]);
  
  if(lam_UNC[1] > 0){
  Rho[2,1] = -1*Rho_UNC[2,1];
  Rho[1,2] = -1*Rho_UNC[1,2];
  }
  if(lam_UNC[7] > 0){
  Rho[2,3] = -1*Rho_UNC[2,3];
  Rho[3,2] = -1*Rho_UNC[3,2];
  }
}
// factor 3
if(lam_UNC[7] < 0){
  lam[7:9,3] = to_vector(-1*lam_UNC[7:9]);
  FS[,3] = to_vector(-1*FS_UNC[,3]);
  
  if(lam_UNC[1] > 0){
  Rho[3,1] = -1*Rho_UNC[3,1];
  Rho[1,3] = -1*Rho_UNC[1,3];
  }
  if(lam_UNC[4] > 0){
  Rho[3,2] = -1*Rho_UNC[3,2];
  Rho[2,3] = -1*Rho_UNC[2,3];
  }
}


/// marginal log-likelihood based on signed corrected parameters
Phi_lv = quad_form_diag(Rho, Sd_d);
lambda_phi_lambda = quad_form_sym(Phi_lv, transpose(lam));
theta_del = diag_matrix(sd_p);

Sigma = lambda_phi_lambda + theta_del;
L_Sigma_model = cholesky_decompose(Sigma);

for(i in 1:N){
log_lik[i] = multi_normal_cholesky_lpdf(X[i] | b, L_Sigma_model);
}

log_lik0 = sum(log_lik); // global log-likelihood
dev = -2*log_lik0; // model deviance

}

I compared this model with lavaan and blavaan, with simulated data


library(lavaan)
library(loo)
library(blavaan)
library(rstan)
#rstan_options(auto_write = TRUE)
#options(mc.cores = parallel::detectCores())

mod1 <-'
f1 =~ .5?y1+.5?y2+.5?y3 #+.1?y4 +.2?y8
f2 =~ .4?y4+.4?y5+.4?y6 #+.1?y1 +.2?y7
f3 =~ .6?y7+.6?y8+.6?y9 #+.1?y6 +.2?y3

f1 ~~ .3?f2+.4?f3
f2 ~~ .5?f3'

mod2 <-'
f1 =~ y1+y2+y3
f2 =~ y4+y5+y6
f3 =~ y7+y8+y9

f1 ~~ f2+f3
f2 ~~ f3'

datsim <- simulateData(mod1,sample.nobs=300,std.lv=T)
head(datsim)

fit <- cfa(mod2,data=datsim,std.lv=T,meanstructure=T)
summary(fit,standardized=T,fit.measures=T)


fitb <- bcfa(mod2,data=datsim,std.lv=T,target="stan")
summary(fitb,standardized=T)
fitMeasures(fitb)

###### run the Stan model
X <- as.matrix(datsim)
P <- ncol(datsim) 
N <- nrow(datsim)
D <- 3

dat <- list(N=N,P=P,D=D,X=X, n_lam=9)
param <- c("b","lam","Rho","sd_p","M","Sd_d",
           "log_lik0","dev","log_lik")# 

##### loop to check for convergence 
ta2 <- Sys.time()
iters <- 5000
keep <- 3000
rhat <- 20
while(rhat > 1.05){
  iters <- iters + 3000
    burn <- iters - keep
  
  OUT_st <- stan(data=dat, pars=param,
                 model_code=cfa_stan_fv,
                 chains=3, iter=iters,warmup=burn,
                 control=list(adapt_delta=.99, max_treedepth=20)) 
  
  ss <- summary(OUT_st)$summary
  print(rhat <- max(ss[,"Rhat"], na.rm=T))
}
tb2 <- Sys.time()
tb2 - ta2

OUT_st


print(OUT_st,digits=3, pars=c("b","lam","Rho","sd_p","M","Sd_d",
                              "log_lik0","dev"))

log_lik1 <- extract_log_lik(OUT_st, merge_chains = FALSE) 
(waic_cfa <- waic(log_lik1))
(loo_cfa <- loo(log_lik1, r_eff=relative_eff(exp(log_lik1))))

I teach a BSEM summer course, havent worked a Multilevel model yet, I will seek to develop a couple of useful examples from this. Its an interesting issue

2 Likes

This is a great example, thanks for posting this. Just trying to follow along here.

I would think that a different approach, in which one specifies the “first” and “remaining” factor scores separately in the parameters block (building on your example):

parameters {
vector<lower=0, upper=30>[3] lam_first;
vector<lower=-30, upper=30>[6] lam_remain;
...
}

and then puts the lambda vector together in the transformed parameters block:

transformed parameters {
vector[n_lam] lambda;
lambda[1] = lam_first[1];
lambda[4] = lam_first[2];
lambda[7] = lam_first[3];
lambda[2:3] = lam_remain[1:2];
lambda[5:6] = lam_remain[3:4];
lambda[8:9] = lam_remain[5:6];
...
}

should also work and make the sign-correction in the generated quantities block unnecessary. Or am I overlooking something?

I try this approach first. It kept giving me multimodal posteriors for the unconstrained lambdas. Trying to figure out why was taking me more time that I wanted to, so I switched to sign correction on the generated quantities block approach. This one worked well easier
I would be happy to find why it wasnt working

The reason is because you could either have loadings 1-J be positive, or loading 1 be positive and 2-J be negative, and it doesn’t affect the likelihood very much. In other words, you could flip the latent direction, have loadings 2-J be negative, and get the same prediction for (J-1)/J items. The only set of data that would suffer is for item 1; in that case, it could basically just push the loading down to zero (as though it ‘wants’ to be negative), and item 1 is basically just an intercept model. The more items you have, the more this can happen.

Imo, the best approach is to just constrain all loadings to be in the /intended/ direction. Or, just unreverse any reverse-scaled indicators, and use all positive loadings. If you’re concerned about method effects, add a method effect to any reversed items.
In my experience, constraining all loadings to be unidirectional (except for cross-loadings, I suppose), and inverting the responses to meet that, will give good Rhats and unimodality every time.

1 Like

Yes, I agree, this is the issue of factor interminancy I describe above. The issue is that constraining a factor loading to be positive fixes the issue in JAGS, but does not work in Stan. In Stan we have found that the best method is to estimate the model without constraints, and sign adjust the parameters in function a chosen factor loading. Like in the example I posted before

What I dont like about this is that you are already deciding that the posterior cannot cover negative values, which in may scenarios could be a constraint too strict

1 Like

Hello!

I am constantly facing a mixing problem with a kind of “exploratory factor analysis”. I am trying to find the latent variables structuring the covariance of a small (31x16) dataset of count data, with many zero.

Here is my dataset. Y.csv (1.3 KB)

I tried to simplify the model as much as possible and finished with the following code.

data{
  int N; // sample size
  int S; // number of species
  int D; // number of factors
  int<lower=0> Y[N,S]; // data matrix of order [N,S]
  
  int<lower=0,upper=1> prior_only; //Should we estimate the likelihood?
}
transformed data{
  int<lower=1> M;
  vector[D] FS_mu; // factor means
  vector<lower=0>[D] FS_sd; // factor standard deviations
  M = D*(S-D)+ D*(D-1)/2;  // number of lower-triangular, non-zero loadings
  for (m in 1:D) {
    FS_mu[m] = 0; //Mean of factors = 0
    FS_sd[m] = 1; //Sd of factors = 1
  }
}
parameters{
  //Parameters
  vector[M] L_lower; //Lower diagonal loadings
  vector<lower=0>[D] L_diag; //Positive diagonal elements of loadings
  matrix[N,D] FS; //Factor scores, matrix of order [N,D]
  cholesky_factor_corr[D] Rho; //Correlation matrix between factors
}
transformed parameters{
  // Final parameters
  cholesky_factor_cov[S,D] L; //Final matrix of laodings
  matrix[D,D] Ld; //Cholesky decomposition of the covariance matrix between factors
  
  //Predictors
  matrix[N,S] Ups; //intermediate predictor
  matrix[N,S] Mu; //predictor
  
  // Correlation matrix of factors
  Ld = diag_pre_multiply(FS_sd, Rho); //Fs_sd fixed to 1, Rho estimated
  
  {
    int idx2; //Index for the lower diagonal loadings
    idx2 = 0;
    
    // Constraints to allow identifiability of loadings
  	 for(i in 1:(D-1)) { for(j in (i+1):(D)){ L[i,j] = 0; } } //0 on upper diagonal
  	 for(i in 1:D) L[i,i] = L_diag[i]; //Positive values on diagonal
  	 for(j in 1:D) {
  	   for(i in (j+1):S) {
  	     idx2 = idx2+1;
  	     L[i,j] = L_lower[idx2]; //Insert lower diagonal values in loadings matrix
  	   }
  	 }
  }
  
  // Predictors
  Ups = FS * L';
  for(n in 1:N) Mu[n,] = Ups[n,];
  
}
model{
  // Priors
  L_lower ~ student_t(7,0,0.5); //Weakly informative prior for lower diag loadings
  L_diag ~ student_t(7,0,0.5);//Weakly informative prior for diag loadings
  Rho ~ lkj_corr_cholesky(2); //Weakly informative prior for Rho
  
  for(i in 1:N){	
    if(prior_only == 0) Y[i,] ~ poisson_log(Mu[i,]);
    FS[i,] ~ multi_normal_cholesky(FS_mu, Ld);
  }
}
generated quantities{
  matrix[S,S] cov_L;
  matrix[S,S] cor_L;
  matrix[N,S] Y_pred;
  matrix[N,S] log_lik1;
  vector[N*S] log_lik;
  
  cov_L = multiply_lower_tri_self_transpose(L); //Create the covariance matrix
  
  // Compute the correlation matrix from de covariance matrix
  for(i in 1:S){
    for(j in 1:S){
      cor_L[i,j] = cov_L[i,j]/sqrt(cov_L[i,i]*cov_L[j,j]);
    }
  }
  
  //Compute the likelihood and predictions for each observation
  for(n in 1:N){
    for(s in 1:S){
      log_lik1[n,s] = poisson_log_lpmf(Y[n,s] | Mu[n,s]);
      Y_pred[n,s] = poisson_log_rng(Mu[n,s]);
    }
  }
  log_lik = to_vector(log_lik1); //Tranform in a vector usable by loo package
}

I thought this model would work without any trouble because a more complex version of it worked well on simulated data. In the current model, I removed the intercept and hierarchical parts. The zero upper-diag and the positive diagonal loadings seemed to be sufficient to allow an efficient sampling in case of simulated data.

However, I do not manage to get a proper sampling with my real data set. Pehaps because of the abundance of zero? They happens because the studied sites are highly different.

Would you have any suggestion? I was wandering if I should try your “change a signe” technique, maybe in reference to the sign of the diagonal of the unconstrained loadings?

Thank you very much, and have a good day!
Lucas

And here is the code to run the model

## Compile the model
GBFM_mod <- stan_model(
  "Markdown/1_SEM_Teabags/Poisson_GBFM_Reparam.stan")

## Prepare stan data
GBFM_data <- list(
  Y = Y,
  N = nrow(Y),
  S = ncol(Y),
  D = 2,
  prior_only = 0)

# Sample from the posterior
GBFM_fit_3 <- sampling(GBFM_mod, data = GBFM_data,
                       iter = 2000,
                       cores = 3, chains = 3)

I tried with a hurdle version, but nothing changed, the trouble seems really to identify the loadings.

data{
  int N; // sample size
  int S; // number of species
  int D; // number of factors
  int<lower=0> Y[N,S]; // data matrix of order [N,S]
  
  int<lower=0,upper=1> prior_only; //Should we estimate the likelihood?
}
transformed data{
  //Hyperparameters of factor scores
  int<lower=1> M;
  vector[D] FS_mu; // factor means
  vector<lower=0>[D] FS_sd; // factor standard deviations
  
  //Factors
  M = D*(S-D)+ D*(D-1)/2;  // number of lower-triangular, non-zero loadings
  for (m in 1:D) {
    FS_mu[m] = 0; //Mean of factors = 0
    FS_sd[m] = 1; //Sd of factors = 1
  }
}
parameters{
  //Parameters
  vector[M] L_lower; //Lower diagonal loadings
  vector<lower=0>[D] L_diag; //Positive diagonal elements of loadings
  matrix[N,D] FS; //Factor scores, matrix of order [N,D]
  cholesky_factor_corr[D] Rho; //Correlation matrix between factors
  real tau; //Shape of the zero counts
}
transformed parameters{
  // Final parameters
  cholesky_factor_cov[S,D] L; //Final matrix of laodings
  matrix[D,D] Ld; //Cholesky decomposition of the covariance matrix between factors
  
  //Predictors
  matrix[N,S] Ups; //intermediate predictor
  matrix[N,S] Theta; //predictor for 0 or 1
  matrix[N,S] Mu; //Predictor for counts
  
  // Correlation matrix of factors
  Ld = diag_pre_multiply(FS_sd, Rho); //Fs_sd fixed to 1, Rho estimated
  
  {
    int idx2; //Index for the lower diagonal loadings
    idx2 = 0;
    
    // Constraints to allow identifiability of loadings
  	 for(i in 1:(D-1)) { for(j in (i+1):(D)){ L[i,j] = 0; } } //0 on upper diagonal
  	 for(i in 1:D) L[i,i] = L_diag[i]; //Positive values on diagonal
  	 for(j in 1:D) {
  	   for(i in (j+1):S) {
  	     idx2 = idx2+1;
  	     L[i,j] = L_lower[idx2]; //Insert lower diagonal values in loadings matrix
  	   }
  	 }
  }
  
  // Predictors
  Ups = FS * L';
  for(n in 1:N) {
    Mu[n,] = exp(Ups[n,]);
    Theta[n,] = inv_logit(tau * Ups[n,]);
  }
  
}
model{
  // Priors
  L_lower ~ student_t(7,0,0.5); //Weakly informative prior for lower diag loadings
  L_diag ~ student_t(7,0,0.5);//Weakly informative prior for diag loadings
  Rho ~ lkj_corr_cholesky(2); //Weakly informative prior for Rho
  tau ~ student_t(7,0,1); 
  
  for(n in 1:N){
    for(s in 1:S){
      if(prior_only == 0){
      if (Y[n,s] == 0)
        target += log(Theta[n,s]);
      else
        target += log1m(Theta[n,s]) + poisson_lpmf(Y[n,s] | Mu[n,s])
                - log1m_exp(-Mu[n,s]);
      }
    }
    FS[n,] ~ multi_normal_cholesky(FS_mu, Ld);
  }
}
generated quantities{
  // matrix[S,S] cov_L;
  // matrix[S,S] cor_L;
  // matrix[N,S] Y_pred;
  // matrix[N,S] log_lik1;
  // vector[N*S] log_lik;
  // 
  // cov_L = multiply_lower_tri_self_transpose(L); //Create the covariance matrix
  // 
  // // Compute the correlation matrix from de covariance matrix
  // for(i in 1:S){
  //   for(j in 1:S){
  //     cor_L[i,j] = cov_L[i,j]/sqrt(cov_L[i,i]*cov_L[j,j]);
  //   }
  // }
  // 
  // //Compute the likelihood and predictions for each observation
  // for(n in 1:N){
  //   for(s in 1:S){
  //     log_lik1[n,s] = poisson_log_lpmf(Y[n,s] | Mu[n,s]);
  //     Y_pred[n,s] = poisson_log_rng(Mu[n,s]);
  //   }
  // }
  // log_lik = to_vector(log_lik1); //Tranform in a vector usable by loo package
}

EDIT : The resulting traceplots

1 Like

Hi Lucas, these traceplots are indicative of factor indeterminancy (you might also see the term ‘reflection invariance’ get thrown around).

If you haven’t had any luck with the current constraints that you’re using, have you tried the sign-correction approach that Mauricio posted above? I’ve had a lot of success using that with my own latent variable models.

1 Like

Hi!

Thank you for your response!

I think I will use the sign correction approach, I agree about your traceplot diagnostic! This is quite neat with some loadings showing sign-switching! Thank you for pointing that out!

I am afraid I need some clarification though. Is there some guidance about choosing the indicator loading (which will determine the sign of the of all the loadings of the same columns, if I understand well)? Is there any problem about ensuring that LL' is positive-semidefinite?

Have a good day!
Lucas

Lucas

I agree, the sign correction should fix the factor indeterminancy. In the future, you can diagnose this by plot the histograms of the factor loadings. If the plot shows a bimodal distribution, whit the same/similar estimate but positive and negative, is a string indication of factor indeterminancy.

About recommendations, the best recommendation is to have the indicator with the strongest factor loading (absolute value) to set the direction of the factor. About LL, I am sorry I dont understand your enough, is this the factor loading matrix?

Thank you for your response Mauricio!

Sign correction with the diag loading as indicator did not lead to convergence, but I will try a more principled way by finding the strongest loading!

L was indeed the loading matrix in my precedent message.

Have a good day!
Lucas

I tried maintaining the upper diag constraints and to sign change in function of the sign of the diagonal. It is not a success yet, but it looks more promising though! Am I missing something in the following code?

data{
  int N; // sample size
  int S; // number of species
  int D; // number of factors
  int<lower=0> Y[N,S]; // data matrix of order [N,S]
  
  int<lower=0,upper=1> prior_only; //Should we estimate the likelihood?
}
transformed data{
  int<lower=1> M;
  vector[D] FS_mu; // factor means
  vector<lower=0>[D] FS_sd; // factor standard deviations
  M = D*(S-D)+ D*(D-1)/2;  // number of lower-triangular, non-zero loadings
  for (m in 1:D) {
    FS_mu[m] = 0; //Mean of factors = 0
    FS_sd[m] = 1; //Sd of factors = 1
  }
}
parameters{
  //Parameters
  vector[M] L_lower; //Lower diagonal loadings
  vector[D] L_diag; //Diagonal elements of loadings
  matrix[N,D] FS_UNC; //Factor scores, matrix of order [N,D]
  cholesky_factor_corr[D] Rho; //Correlation matrix between factors
}
transformed parameters{
  // Final parameters
  matrix[S,D] L_UNC; //Final matrix of laodings
  matrix[D,D] Ld; //Cholesky decomposition of the covariance matrix between factors
  //Predictors
  matrix[N,S] Ups; //intermediate predictor
  matrix[N,S] Mu; //predictor
  
  // Correlation matrix of factors
  Ld = diag_pre_multiply(FS_sd, Rho); //Fs_sd fixed to 1, Rho estimated
  
  {
    int idx2; //Index for the lower diagonal loadings
    idx2 = 0;
    
    // Constraints to allow identifiability of loadings
  	 for(i in 1:(D-1)) { for(j in (i+1):(D)){ L_UNC[i,j] = 0; } } //0 on upper diagonal
  	 for(i in 1:D) L_UNC[i,i] = L_diag[i]; //Positive values on diagonal
  	 for(j in 1:D) {
  	   for(i in (j+1):S) {
  	     idx2 = idx2+1;
  	     L_UNC[i,j] = L_lower[idx2]; //Insert lower diagonal values in loadings matrix
  	   }
  	 }
  }
  
  // Predictors
  Ups = FS_UNC * L_UNC';
  for(n in 1:N) Mu[n,] = Ups[n,];
  
}
model{
  // Priors
  L_lower ~ normal(0,5); //Weakly informative prior for lower diag loadings
  L_diag ~ normal(0,5);//Weakly informative prior for diag loadings
  Rho ~ lkj_corr_cholesky(2); //Weakly informative prior for Rho
  
  for(i in 1:N){	
    if(prior_only == 0) Y[i,] ~ poisson_log(Mu[i,]);
    FS_UNC[i,] ~ multi_normal_cholesky(FS_mu, Ld);
  }
}
generated quantities{
  // Sign changing
  matrix[S,D] L = L_UNC; //Constrained loading matrix
  matrix[N,D] FS = FS_UNC; //Constrained factor scores
  // Covariance an correlation
  matrix[S,S] cov_L;
  matrix[S,S] cor_L;
  // Prediction and log-lik
  matrix[N,S] Y_pred;
  matrix[N,S] log_lik1;
  vector[N*S] log_lik;
  
  // Sign changing based on the sign of the diagonal loading
  for(d in 1:D){
    if(L_UNC[d,d] < 0){
      L[,d] = -1 * L_UNC[,d];
      FS[,d] = -1 * FS_UNC[,d];
    }
  }
  
  cov_L = multiply_lower_tri_self_transpose(L); //Create the covariance matrix
  
  // Compute the correlation matrix from de covariance matrix
  for(i in 1:S){
    for(j in 1:S){
      cor_L[i,j] = cov_L[i,j]/sqrt(cov_L[i,i]*cov_L[j,j]);
    }
  }
  
  //Compute the likelihood and predictions for each observation
  for(n in 1:N){
    for(s in 1:S){
      log_lik1[n,s] = poisson_log_lpmf(Y[n,s] | Mu[n,s]);
      Y_pred[n,s] = poisson_log_rng(Mu[n,s]);
    }
  }
  log_lik = to_vector(log_lik1); //Tranform in a vector usable by loo package
}

Thank you!
Lucas

Lucas

I am not sure about the neccesary constraints for exploratory factor analysis. For example in the semTools package, the identification constraint for it is that the sum of multiplied factor loadings is 0, as in row multiplication of the cator loadings matrix = 0

This sign adjustment seems to be correct. You are missing to correct the factor correlation, like this section from my previous code.


Rho_UNC = multiply_lower_tri_self_transpose(L_corr_d);
Rho = Rho_UNC;

// factor 1
if(lam_UNC[1] < 0){
  lam[1:3,1] = to_vector(-1*lam_UNC[1:3]);
  FS[,1] = to_vector(-1*FS_UNC[,1]);
  
  if(lam_UNC[4] > 0){
  Rho[1,2] = -1*Rho_UNC[1,2];
  Rho[2,1] = -1*Rho_UNC[2,1];
  }
  if(lam_UNC[7] > 0){
  Rho[1,3] = -1*Rho_UNC[1,3];
  Rho[3,1] = -1*Rho_UNC[3,1];
  }
}

Although I am not sure this will fix the weird traceplots

Hello!

As a follow-up, the model converged well when I increased the adapt_delta argument to 0.9!

I am still in search for a solution about how to write a combination of loops and if statement to correct the sign of rho without having to change the code when the number of factor change.

I looked at the code generated by blavaan, but it is unclear to me how one can include predictors of factor scores with sign-correcting. Does the sign correction will have to be propagated to the slopes linking predictor to factor scores too?? I think the way predictors of latent variables are handled in blavaan overwhelm me.

Thank you very much for all your help!
Lucas

Lucas

Glad it is working now

I agree, i havent found a way to automate this section of sign adjusting.

Yes, basically sign correct the slope as another correlation, following the same rules/steps as before

blavaan handles the regressions in a way way more fancy than I can code, this does makes it harder to read sometimes

How I have code the regressions between factors, follow the next “rules” changes in the model section. For an example like this

HS.model <- ' visual  =~ x1 + x2 + x3
              textual =~ x4 + x5 + x6
              speed   =~ x7 + x8 + x9 
visual ~ textual + speed
'
  1. The outcome is set to follow a normal() instead of multivariate_normal(), because it is no longer correlated with anything else
  2. The 2 predictors factor, still follow a multivariate_normal(), for the correlation between them
  3. The mean of the predictor factors are fixed to 0, and their SD = 1
  4. The mean of the outcome factor is a function of the regression (M3), with an intercept of 0, to keep the factor identification constraints, the total variance is set to 1

I dont have a full example, but here would be the transform parameter and model section. In the generated quantities you would need to add the slopes into the sign correction if statements

transformed parameters{
vector[D-1] M;
vector<lower=0, upper=100>[D-1] Sd_d; // standard deviations
real mu[N,P];
matrix[D-1,D-1] Ly;// cholesky of the corr between factors
vector[N] M3;

for (m in 1:(D-1)) {
M[m] = 0;
Sd_d[m] = 1;}

Ly = diag_pre_multiply(Sd_d, L_corr_d);

for(i in 1:N){
mu[i,1] = b[1] + lam[1]*FS[i,1];
mu[i,2] = b[2] + lam[2]*FS[i,1];
mu[i,3] = b[3] + lam[3]*FS[i,1];
mu[i,4] = b[4] + lam[4]*FS[i,2];
mu[i,5] = b[5] + lam[5]*FS[i,2];
mu[i,6] = b[6] + lam[6]*FS[i,2];
mu[i,7] = b[7] + lam[7]*FS3[i];
mu[i,8] = b[8] + lam[8]*FS3[i];
mu[i,9] = b[9] + lam[9]*FS3[i];

M3[i] = reg[1]*FS[i,1] + reg[2]*FS[i,2];
}
}


model{

L_corr_d ~ lkj_corr_cholesky(2);
b ~ normal(0, 5);
lam ~ normal(0.4,10);
reg ~ normal(0.2,10);

FS3 ~ normal(M3, 1);

for(i in 1:N){  
FS[i] ~ multi_normal_cholesky(M, Ly);
}

}

Hope something like this works

Thank you very much @Mauricio_Garnier-Villarre, I will give it a try soon!

Lucas