Non-convergence of latent variable model

Hi Lucas, these traceplots are indicative of factor indeterminancy (you might also see the term ‘reflection invariance’ get thrown around).

If you haven’t had any luck with the current constraints that you’re using, have you tried the sign-correction approach that Mauricio posted above? I’ve had a lot of success using that with my own latent variable models.

1 Like

Hi!

Thank you for your response!

I think I will use the sign correction approach, I agree about your traceplot diagnostic! This is quite neat with some loadings showing sign-switching! Thank you for pointing that out!

I am afraid I need some clarification though. Is there some guidance about choosing the indicator loading (which will determine the sign of the of all the loadings of the same columns, if I understand well)? Is there any problem about ensuring that LL' is positive-semidefinite?

Have a good day!
Lucas

Lucas

I agree, the sign correction should fix the factor indeterminancy. In the future, you can diagnose this by plot the histograms of the factor loadings. If the plot shows a bimodal distribution, whit the same/similar estimate but positive and negative, is a string indication of factor indeterminancy.

About recommendations, the best recommendation is to have the indicator with the strongest factor loading (absolute value) to set the direction of the factor. About LL, I am sorry I dont understand your enough, is this the factor loading matrix?

Thank you for your response Mauricio!

Sign correction with the diag loading as indicator did not lead to convergence, but I will try a more principled way by finding the strongest loading!

L was indeed the loading matrix in my precedent message.

Have a good day!
Lucas

I tried maintaining the upper diag constraints and to sign change in function of the sign of the diagonal. It is not a success yet, but it looks more promising though! Am I missing something in the following code?

data{
  int N; // sample size
  int S; // number of species
  int D; // number of factors
  int<lower=0> Y[N,S]; // data matrix of order [N,S]
  
  int<lower=0,upper=1> prior_only; //Should we estimate the likelihood?
}
transformed data{
  int<lower=1> M;
  vector[D] FS_mu; // factor means
  vector<lower=0>[D] FS_sd; // factor standard deviations
  M = D*(S-D)+ D*(D-1)/2;  // number of lower-triangular, non-zero loadings
  for (m in 1:D) {
    FS_mu[m] = 0; //Mean of factors = 0
    FS_sd[m] = 1; //Sd of factors = 1
  }
}
parameters{
  //Parameters
  vector[M] L_lower; //Lower diagonal loadings
  vector[D] L_diag; //Diagonal elements of loadings
  matrix[N,D] FS_UNC; //Factor scores, matrix of order [N,D]
  cholesky_factor_corr[D] Rho; //Correlation matrix between factors
}
transformed parameters{
  // Final parameters
  matrix[S,D] L_UNC; //Final matrix of laodings
  matrix[D,D] Ld; //Cholesky decomposition of the covariance matrix between factors
  //Predictors
  matrix[N,S] Ups; //intermediate predictor
  matrix[N,S] Mu; //predictor
  
  // Correlation matrix of factors
  Ld = diag_pre_multiply(FS_sd, Rho); //Fs_sd fixed to 1, Rho estimated
  
  {
    int idx2; //Index for the lower diagonal loadings
    idx2 = 0;
    
    // Constraints to allow identifiability of loadings
  	 for(i in 1:(D-1)) { for(j in (i+1):(D)){ L_UNC[i,j] = 0; } } //0 on upper diagonal
  	 for(i in 1:D) L_UNC[i,i] = L_diag[i]; //Positive values on diagonal
  	 for(j in 1:D) {
  	   for(i in (j+1):S) {
  	     idx2 = idx2+1;
  	     L_UNC[i,j] = L_lower[idx2]; //Insert lower diagonal values in loadings matrix
  	   }
  	 }
  }
  
  // Predictors
  Ups = FS_UNC * L_UNC';
  for(n in 1:N) Mu[n,] = Ups[n,];
  
}
model{
  // Priors
  L_lower ~ normal(0,5); //Weakly informative prior for lower diag loadings
  L_diag ~ normal(0,5);//Weakly informative prior for diag loadings
  Rho ~ lkj_corr_cholesky(2); //Weakly informative prior for Rho
  
  for(i in 1:N){	
    if(prior_only == 0) Y[i,] ~ poisson_log(Mu[i,]);
    FS_UNC[i,] ~ multi_normal_cholesky(FS_mu, Ld);
  }
}
generated quantities{
  // Sign changing
  matrix[S,D] L = L_UNC; //Constrained loading matrix
  matrix[N,D] FS = FS_UNC; //Constrained factor scores
  // Covariance an correlation
  matrix[S,S] cov_L;
  matrix[S,S] cor_L;
  // Prediction and log-lik
  matrix[N,S] Y_pred;
  matrix[N,S] log_lik1;
  vector[N*S] log_lik;
  
  // Sign changing based on the sign of the diagonal loading
  for(d in 1:D){
    if(L_UNC[d,d] < 0){
      L[,d] = -1 * L_UNC[,d];
      FS[,d] = -1 * FS_UNC[,d];
    }
  }
  
  cov_L = multiply_lower_tri_self_transpose(L); //Create the covariance matrix
  
  // Compute the correlation matrix from de covariance matrix
  for(i in 1:S){
    for(j in 1:S){
      cor_L[i,j] = cov_L[i,j]/sqrt(cov_L[i,i]*cov_L[j,j]);
    }
  }
  
  //Compute the likelihood and predictions for each observation
  for(n in 1:N){
    for(s in 1:S){
      log_lik1[n,s] = poisson_log_lpmf(Y[n,s] | Mu[n,s]);
      Y_pred[n,s] = poisson_log_rng(Mu[n,s]);
    }
  }
  log_lik = to_vector(log_lik1); //Tranform in a vector usable by loo package
}

Thank you!
Lucas

Lucas

I am not sure about the neccesary constraints for exploratory factor analysis. For example in the semTools package, the identification constraint for it is that the sum of multiplied factor loadings is 0, as in row multiplication of the cator loadings matrix = 0

This sign adjustment seems to be correct. You are missing to correct the factor correlation, like this section from my previous code.


Rho_UNC = multiply_lower_tri_self_transpose(L_corr_d);
Rho = Rho_UNC;

// factor 1
if(lam_UNC[1] < 0){
  lam[1:3,1] = to_vector(-1*lam_UNC[1:3]);
  FS[,1] = to_vector(-1*FS_UNC[,1]);
  
  if(lam_UNC[4] > 0){
  Rho[1,2] = -1*Rho_UNC[1,2];
  Rho[2,1] = -1*Rho_UNC[2,1];
  }
  if(lam_UNC[7] > 0){
  Rho[1,3] = -1*Rho_UNC[1,3];
  Rho[3,1] = -1*Rho_UNC[3,1];
  }
}

Although I am not sure this will fix the weird traceplots

Hello!

As a follow-up, the model converged well when I increased the adapt_delta argument to 0.9!

I am still in search for a solution about how to write a combination of loops and if statement to correct the sign of rho without having to change the code when the number of factor change.

I looked at the code generated by blavaan, but it is unclear to me how one can include predictors of factor scores with sign-correcting. Does the sign correction will have to be propagated to the slopes linking predictor to factor scores too?? I think the way predictors of latent variables are handled in blavaan overwhelm me.

Thank you very much for all your help!
Lucas

Lucas

Glad it is working now

I agree, i havent found a way to automate this section of sign adjusting.

Yes, basically sign correct the slope as another correlation, following the same rules/steps as before

blavaan handles the regressions in a way way more fancy than I can code, this does makes it harder to read sometimes

How I have code the regressions between factors, follow the next “rules” changes in the model section. For an example like this

HS.model <- ' visual  =~ x1 + x2 + x3
              textual =~ x4 + x5 + x6
              speed   =~ x7 + x8 + x9 
visual ~ textual + speed
'
  1. The outcome is set to follow a normal() instead of multivariate_normal(), because it is no longer correlated with anything else
  2. The 2 predictors factor, still follow a multivariate_normal(), for the correlation between them
  3. The mean of the predictor factors are fixed to 0, and their SD = 1
  4. The mean of the outcome factor is a function of the regression (M3), with an intercept of 0, to keep the factor identification constraints, the total variance is set to 1

I dont have a full example, but here would be the transform parameter and model section. In the generated quantities you would need to add the slopes into the sign correction if statements

transformed parameters{
vector[D-1] M;
vector<lower=0, upper=100>[D-1] Sd_d; // standard deviations
real mu[N,P];
matrix[D-1,D-1] Ly;// cholesky of the corr between factors
vector[N] M3;

for (m in 1:(D-1)) {
M[m] = 0;
Sd_d[m] = 1;}

Ly = diag_pre_multiply(Sd_d, L_corr_d);

for(i in 1:N){
mu[i,1] = b[1] + lam[1]*FS[i,1];
mu[i,2] = b[2] + lam[2]*FS[i,1];
mu[i,3] = b[3] + lam[3]*FS[i,1];
mu[i,4] = b[4] + lam[4]*FS[i,2];
mu[i,5] = b[5] + lam[5]*FS[i,2];
mu[i,6] = b[6] + lam[6]*FS[i,2];
mu[i,7] = b[7] + lam[7]*FS3[i];
mu[i,8] = b[8] + lam[8]*FS3[i];
mu[i,9] = b[9] + lam[9]*FS3[i];

M3[i] = reg[1]*FS[i,1] + reg[2]*FS[i,2];
}
}


model{

L_corr_d ~ lkj_corr_cholesky(2);
b ~ normal(0, 5);
lam ~ normal(0.4,10);
reg ~ normal(0.2,10);

FS3 ~ normal(M3, 1);

for(i in 1:N){  
FS[i] ~ multi_normal_cholesky(M, Ly);
}

}

Hope something like this works

Thank you very much @Mauricio_Garnier-Villarre, I will give it a try soon!

Lucas