Non-convergence of latent variable model

Reg_John · December 26, 2019, 8:03pm

I am having problems with model convergence even after increasing the number of iterations to as many as 50000. Rhat isn’t close to 1.
I have attached a copy of my code which models a CFA using the Holzinger Swineford data in lavaan.

data {
  int N; //number of subjects
  int Nf; // number of factors
  int ind; //number of between level indicators
  matrix[ind,Nf] l_p; // loading patterns
  vector[ind] Y[N]; // between level indicators. In example, only 6 indicators
  int <lower =1> Nf_nzero; //number of non zero loadings
}


parameters {
  matrix[2,Nf] Nf_ld; 
  //vector[ind] intercepts;
  cholesky_factor_corr[Nf] Nf_corr; 
  vector <lower = 0> [ind] sd_res; 
  vector <lower = 0> [Nf] Nf_var;
}

transformed parameters {
 
  row_vector[Nf] ones;
  matrix[3, Nf] ld_matrix;
  matrix[ind, Nf] loadings;
  vector[Nf_nzero] vec_load; 
  cov_matrix[ind] Sigma;
  cov_matrix[ind] theta_del;
  corr_matrix[Nf] omega;
  cov_matrix[Nf] Phi_nf;
  matrix[ind,ind] lamba_phi_lampha;
  vector[Nf] zeros;
  
  
  zeros = rep_vector(0, Nf);
  
  
  ones = rep_row_vector(1, Nf);
  ld_matrix = append_row(ones, Nf_ld); 
  vec_load = to_vector(ld_matrix);

   {
    int pos = 0;
    for (i in 1:ind){
      for (j in 1:Nf) {
        
        if(l_p[i,j] != 0) {
          pos = pos + 1;
          loadings[i,j] = vec_load[pos];
        } else {
          loadings[i,j] = 0;
        }
      }
    }
  }  
  
  omega = multiply_lower_tri_self_transpose(Nf_corr);
  
  Phi_nf = quad_form_diag(omega, Nf_var);
  
  lamba_phi_lampha = quad_form_sym(Phi_nf, loadings');
  
  theta_del = diag_matrix(sd_res);
  
  Sigma = lamba_phi_lampha + theta_del;
}
model {
  
   Y ~ multi_normal(rep_vector(0, ind), Sigma);
   
   //priors
  sd_res ~ cauchy(0,5); 
  Nf_var ~ cauchy(0,5);

  to_vector(Nf_ld) ~ normal(0,10);
  
  Nf_corr ~ lkj_corr_cholesky(1);
}

So I’m using the reference indicator approach to scale the latent variables i.e. the first loading of each latent variable is set to one.

Y \sim N(0,\Lambda\Phi\Lambda' + \theta_{\delta}) , where \Lambda represents my loadings, \Phi is the covariance matrix of the latent variables, and \theta_{\delta} is the variance of the residuals.

I will appreciate any assistance that can be provided.

Mauricio_Garnier-Villarre · December 27, 2019, 8:20am

I am having a hard time following your indexing, can try to follow it with more time.

But thought relevant to mentioned that Ed Merkle has develop the package blavaan, which runs SEM models based on lavaan syntax. This can be run in either Stan or JAGS. With a bunch of added features as similar to lavaan as possible.

As any package, does not work for every model yet, but it works well for the developed features, without convergence problems

blavaan

Also, you can ask blavaan to export the Stan/JAGS model syntax

nhuurre · December 27, 2019, 9:08am

Slow convergence probably means the posterior is multimodal or nonidentified. More informative priors may help. Normal prior would be better than cauchy (which is very wide). So I’d start by guesstimating what is the largest sd_res and Nf_var that could reasonably occur and set

sd_res ~ normal(0,max_sd/2); 
Nf_var ~ normal(0,max_var/2);

Reg_John · December 27, 2019, 5:50pm

I am unable to use Ed Merkle’s marvelous blavaan package this time because I am trying to build a multilevel structural equation model, which blavaan unfortunately does not support at this time.

Mauricio_Garnier-Villarre · December 27, 2019, 8:02pm

I see, didnt notice that it was a multilevel CFA. I would recommend you to look at Song and Lee book, they have code example for multilevel SEM in chapter 6. The code is in WinBUGS, but it can be translated into Stan.

Song, X.-Y., & Lee, S.-Y. (2012). Basic and Advanced Bayesian Structural Equation Modeling: With Applications in the Medical and Behavioral Sciences . West Sussex, United Kindom: John Wiley & Son, Ltd.

Also, another recommendation I would make is that you are ow writing the code in function of the covariance equations (which are faster) but I think for Multilevel would be easier to code it in function of the data model with data augmentation for the factor score (at least easier for how I code)

I dont think the Holzinger Swineford is a good example to work for this, as it only have 2 groups (might be too few for a mutilevel solution). I will try and replicate an output from the Mplus examples on Chapter 9: Multilevel Modeling with Complex Survey Data like example 9.6. This should be more straightforward, and having an output to compare to can help check the model

A couple comments/questions on your code,

You define the factors means at 0, but I dont see these being use later in the code.

Here you are defining the item intercepts to be fix at 0, they are not being estimated

What is the Rhat that you are having problems with, how large? for all parameters?

I alwyas prefer to scale the factors in function of a fix variance at 1 and fix means at 0, instead of using the marker variable. This gives the interpretation of the parameters in function of a standardized factor, instead of the chosen indicator

Reg_John · December 27, 2019, 8:40pm

Thanks for pointing that out. That is a mistake.

It looks like for the first freely estimated parameters of the loadings, sd_res, and Nf_var have large Rhats of 6.18, 14.90, and 14.57.
Here, I’m using a half_normal prior as suggested by @nhuurre.

I will try the fixed variance approach.

I am trying to first get the measurement model working before building on it to include the multilevel components. Another reason is I am new to using Stan, so I thought I could start with something I am more familiar with.

Thanks very much for the book reference. It should definitely help me with working on multilevel SEM data.

PS: I have realized that the model converges immediately with Rhat = 1 for all parameters when I constrain the loadings to be nonnegative. This, however presents a theoretical challenge.

Mauricio_Garnier-Villarre · December 28, 2019, 4:33am

I agree, this does presents theoretical limitations. As well as functional for constraining too much the distribution a priori. Now, it is a common approach to fix 1 factor loadings to be positive to define the direction of the factor, factor indeterminacy. This is what Ed does in blavaan

Reg_John · December 30, 2019, 10:15pm

Thanks very much for your help.

avehtari · December 31, 2019, 12:05pm

This reminded me that we should add to help text a comment that if Rhats are very large and doesn’t change much when the number of iterations is doubled, then the posterior is very likely multimodal with well separated modes (might be also due to aliasing). If Rhats for each single chain are close to 1, then it’s even more likely to be multimodal with well separated modes each of which are easy to sample from. If Rhats for each single chain are not close to 1, then it’s likely there is identifiability issue. I don’t think we have yet this mentioned in help texts, but we do recommend looking at the effective sample size estimates when the number of iterations is increased in our paper [1903.08008] Rank-normalization, folding, and localization: An improved $\widehat{R}$ for assessing convergence of MCMC. ESS is also easier to interpret to diagnose multimodality as then ESS <= the number of chains.

james.uanhoro · January 2, 2020, 1:14am

A simple solution is to set the latent variable variances to 1. Sample the loadings unconstrained and create a copy of the loadings in the transformed parameters block. Select an item per factor as a marker item. Check factor by factor, if the marker item loading is negative, then multiply all the loadings on that factor by -1.

You will also have to make changes to the interfactor correlation matrix. So create a copy of this matrix in the transformed parameters. And also multiply the relevant correlations by -1. So if the loadings on factor 2 were multiplied by -1, then all the correlations to factor 2 should also be multiplied by -1. Remember to multiply row 2 and column 2 of the correlation matrix.

You’d have to do all this before creating lambda_phi_lambda. blavaan now does something like this.

Mauricio_Garnier-Villarre · January 2, 2020, 2:21am

Just to clarify, blavaan does this in the generated quantities block, instead of the transformed block. Right now blavaan has two different Stan models, one based on creating factor scores based on data augmentation (target=“stanclassic”), and the default precompiled now which does not estimate factor scores and works with the covariance matrix equations, based on LISREL parameterization (target=“stan”). In both method the parameters are signed corrected in the generated quantities block

You can export the Stan code for either method and see how this works on blavaan with the mcmcfile argument

HS.model ← ’ visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9 ’

fit ← bcfa(HS.model, data = HolzingerSwineford1939, std.lv=T, mcmcfile =“default”, target=“stan”)
fit2 ← bcfa(HS.model, data = HolzingerSwineford1939, std.lv=T, mcmcfile =“augmentation”, target=“stanclassic”)

james.uanhoro · January 2, 2020, 2:25am

OP can make these changes in either location in their current code, which follows the LISREL measurement equation.

I would not expect OP to have much use for the exported blavaan Stan code as it is quite bulky, and difficult to follow in certain parts.

Mauricio_Garnier-Villarre · January 3, 2020, 12:46am

Agree, the blavaan exported Stan code, is bulky and can be difficult to follow. I still like to recommend these options, as its a good way to learn coding tricks by seeing code that works

As of the model, I worked and example model similar to the original model posted, with the corrected signed parameters in the generated quantities block. Some comments about the model, I estimated the model based on the data augmentation approach, which for me at least make it easier to extend for Multilevel CFA. But the log-likelihood for model comparison is estimated with the covariance matrix approach, so the marginal log-likelihood is used instead of the conditional, as recommended by

Merkle, E. C., Furr, D., & Rabe-Hesketh, S. (2019). Bayesian model assessment: Use of conditional vs marginal likelihoods. Psychometrika , 84 , 802–829.

data{
int N; // sample size
int P; // number of variables
int D; // number of dimensions
vector[P] X[N]; // data matrix of order [N,P]
int n_lam; // how many factor loadings are estimated
}

parameters{
vector[P] b; // intercepts
matrix[N,D] FS_UNC; // factor scores, matrix of order [N,D]
cholesky_factor_corr[D] L_corr_d; // cholesky correlation between factors
vector<lower=0>[P] sd_p; // residual sd for each variable
vector<lower=-30, upper=30>[n_lam] lam_UNC; // not positivily constraint 
}

transformed parameters{
vector[D] M; // factor means
vector<lower=0>[D] Sd_d; // sd for each factor
vector[P] mu_UNC[N]; // predicted values
cholesky_factor_cov[D] L_Sigma;

M = rep_vector(0, D);
Sd_d = rep_vector(1, D);

L_Sigma = diag_pre_multiply(Sd_d, L_corr_d);


for(i in 1:N){
mu_UNC[i,1] = b[1] + lam_UNC[1]*FS_UNC[i,1];
mu_UNC[i,2] = b[2] + lam_UNC[2]*FS_UNC[i,1];
mu_UNC[i,3] = b[3] + lam_UNC[3]*FS_UNC[i,1];
mu_UNC[i,4] = b[4] + lam_UNC[4]*FS_UNC[i,2];
mu_UNC[i,5] = b[5] + lam_UNC[5]*FS_UNC[i,2];
mu_UNC[i,6] = b[6] + lam_UNC[6]*FS_UNC[i,2];
mu_UNC[i,7] = b[7] + lam_UNC[7]*FS_UNC[i,3];
mu_UNC[i,8] = b[8] + lam_UNC[8]*FS_UNC[i,3];
mu_UNC[i,9] = b[9] + lam_UNC[9]*FS_UNC[i,3];
}

}

model{

L_corr_d ~ lkj_corr_cholesky(1);
b ~ normal(0, 10);
lam_UNC ~ normal(0, 10);
sd_p ~ cauchy(0,2.5);
///Sd_d ~ cauchy(0,2.5);

for(i in 1:N){
for(j in 1:P){
X[i,j] ~ normal(mu_UNC[i,j], sd_p[j]);
}
FS_UNC[i] ~ multi_normal_cholesky(M, L_Sigma);
}

}

generated quantities{
real log_lik[N]; ///// log likelihood matrix
real dev; /////// global deviance
real log_lik0; ///// global log likelihood
vector[N] log_lik_row;

cov_matrix[P] Sigma;
cov_matrix[D] Phi_lv;
matrix[P,P] lambda_phi_lambda;
cholesky_factor_cov[P] L_Sigma_model;
matrix[P,P] theta_del;

corr_matrix[D] Rho_UNC; /// correlation matrix
corr_matrix[D] Rho; /// correlation matrix
matrix[P,D] lam;  // factor loadings
matrix[N,D] FS; // factor scores, matrix of order [N,D]
///vector[P] mu[N]; // predicted values

/// signed corrected parameters
Rho_UNC = multiply_lower_tri_self_transpose(L_corr_d);
Rho = Rho_UNC;
FS = FS_UNC;
lam = rep_matrix(0, P, D);
lam[1:3,1] = to_vector(lam_UNC[1:3]);
lam[4:6,2] = to_vector(lam_UNC[4:6]);
lam[7:9,3] = to_vector(lam_UNC[7:9]);

// factor 1
if(lam_UNC[1] < 0){
  lam[1:3,1] = to_vector(-1*lam_UNC[1:3]);
  FS[,1] = to_vector(-1*FS_UNC[,1]);
  
  if(lam_UNC[4] > 0){
  Rho[1,2] = -1*Rho_UNC[1,2];
  Rho[2,1] = -1*Rho_UNC[2,1];
  }
  if(lam_UNC[7] > 0){
  Rho[1,3] = -1*Rho_UNC[1,3];
  Rho[3,1] = -1*Rho_UNC[3,1];
  }
}
// factor 2
if(lam_UNC[4] < 0){
  lam[4:6,2] = to_vector(-1*lam_UNC[4:6]);
  FS[,2] = to_vector(-1*FS_UNC[,2]);
  
  if(lam_UNC[1] > 0){
  Rho[2,1] = -1*Rho_UNC[2,1];
  Rho[1,2] = -1*Rho_UNC[1,2];
  }
  if(lam_UNC[7] > 0){
  Rho[2,3] = -1*Rho_UNC[2,3];
  Rho[3,2] = -1*Rho_UNC[3,2];
  }
}
// factor 3
if(lam_UNC[7] < 0){
  lam[7:9,3] = to_vector(-1*lam_UNC[7:9]);
  FS[,3] = to_vector(-1*FS_UNC[,3]);
  
  if(lam_UNC[1] > 0){
  Rho[3,1] = -1*Rho_UNC[3,1];
  Rho[1,3] = -1*Rho_UNC[1,3];
  }
  if(lam_UNC[4] > 0){
  Rho[3,2] = -1*Rho_UNC[3,2];
  Rho[2,3] = -1*Rho_UNC[2,3];
  }
}


/// marginal log-likelihood based on signed corrected parameters
Phi_lv = quad_form_diag(Rho, Sd_d);
lambda_phi_lambda = quad_form_sym(Phi_lv, transpose(lam));
theta_del = diag_matrix(sd_p);

Sigma = lambda_phi_lambda + theta_del;
L_Sigma_model = cholesky_decompose(Sigma);

for(i in 1:N){
log_lik[i] = multi_normal_cholesky_lpdf(X[i] | b, L_Sigma_model);
}

log_lik0 = sum(log_lik); // global log-likelihood
dev = -2*log_lik0; // model deviance

}

I compared this model with lavaan and blavaan, with simulated data


library(lavaan)
library(loo)
library(blavaan)
library(rstan)
#rstan_options(auto_write = TRUE)
#options(mc.cores = parallel::detectCores())

mod1 <-'
f1 =~ .5?y1+.5?y2+.5?y3 #+.1?y4 +.2?y8
f2 =~ .4?y4+.4?y5+.4?y6 #+.1?y1 +.2?y7
f3 =~ .6?y7+.6?y8+.6?y9 #+.1?y6 +.2?y3

f1 ~~ .3?f2+.4?f3
f2 ~~ .5?f3'

mod2 <-'
f1 =~ y1+y2+y3
f2 =~ y4+y5+y6
f3 =~ y7+y8+y9

f1 ~~ f2+f3
f2 ~~ f3'

datsim <- simulateData(mod1,sample.nobs=300,std.lv=T)
head(datsim)

fit <- cfa(mod2,data=datsim,std.lv=T,meanstructure=T)
summary(fit,standardized=T,fit.measures=T)


fitb <- bcfa(mod2,data=datsim,std.lv=T,target="stan")
summary(fitb,standardized=T)
fitMeasures(fitb)

###### run the Stan model
X <- as.matrix(datsim)
P <- ncol(datsim) 
N <- nrow(datsim)
D <- 3

dat <- list(N=N,P=P,D=D,X=X, n_lam=9)
param <- c("b","lam","Rho","sd_p","M","Sd_d",
           "log_lik0","dev","log_lik")# 

##### loop to check for convergence 
ta2 <- Sys.time()
iters <- 5000
keep <- 3000
rhat <- 20
while(rhat > 1.05){
  iters <- iters + 3000
    burn <- iters - keep
  
  OUT_st <- stan(data=dat, pars=param,
                 model_code=cfa_stan_fv,
                 chains=3, iter=iters,warmup=burn,
                 control=list(adapt_delta=.99, max_treedepth=20)) 
  
  ss <- summary(OUT_st)$summary
  print(rhat <- max(ss[,"Rhat"], na.rm=T))
}
tb2 <- Sys.time()
tb2 - ta2

OUT_st


print(OUT_st,digits=3, pars=c("b","lam","Rho","sd_p","M","Sd_d",
                              "log_lik0","dev"))

log_lik1 <- extract_log_lik(OUT_st, merge_chains = FALSE) 
(waic_cfa <- waic(log_lik1))
(loo_cfa <- loo(log_lik1, r_eff=relative_eff(exp(log_lik1))))

I teach a BSEM summer course, havent worked a Multilevel model yet, I will seek to develop a couple of useful examples from this. Its an interesting issue

Guido_Biele · January 3, 2020, 9:46am

This is a great example, thanks for posting this. Just trying to follow along here.

I would think that a different approach, in which one specifies the “first” and “remaining” factor scores separately in the parameters block (building on your example):

parameters {
vector<lower=0, upper=30>[3] lam_first;
vector<lower=-30, upper=30>[6] lam_remain;
...
}

and then puts the lambda vector together in the transformed parameters block:

transformed parameters {
vector[n_lam] lambda;
lambda[1] = lam_first[1];
lambda[4] = lam_first[2];
lambda[7] = lam_first[3];
lambda[2:3] = lam_remain[1:2];
lambda[5:6] = lam_remain[3:4];
lambda[8:9] = lam_remain[5:6];
...
}

should also work and make the sign-correction in the generated quantities block unnecessary. Or am I overlooking something?

Mauricio_Garnier-Villarre · January 3, 2020, 7:14pm

I try this approach first. It kept giving me multimodal posteriors for the unconstrained lambdas. Trying to figure out why was taking me more time that I wanted to, so I switched to sign correction on the generated quantities block approach. This one worked well easier
I would be happy to find why it wasnt working

Stephen_Martin · January 15, 2020, 8:29pm

The reason is because you could either have loadings 1-J be positive, or loading 1 be positive and 2-J be negative, and it doesn’t affect the likelihood very much. In other words, you could flip the latent direction, have loadings 2-J be negative, and get the same prediction for (J-1)/J items. The only set of data that would suffer is for item 1; in that case, it could basically just push the loading down to zero (as though it ‘wants’ to be negative), and item 1 is basically just an intercept model. The more items you have, the more this can happen.

Imo, the best approach is to just constrain all loadings to be in the /intended/ direction. Or, just unreverse any reverse-scaled indicators, and use all positive loadings. If you’re concerned about method effects, add a method effect to any reversed items.
In my experience, constraining all loadings to be unidirectional (except for cross-loadings, I suppose), and inverting the responses to meet that, will give good Rhats and unimodality every time.

Mauricio_Garnier-Villarre · January 16, 2020, 3:27am

Yes, I agree, this is the issue of factor interminancy I describe above. The issue is that constraining a factor loading to be positive fixes the issue in JAGS, but does not work in Stan. In Stan we have found that the best method is to estimate the model without constraints, and sign adjust the parameters in function a chosen factor loading. Like in the example I posted before

What I dont like about this is that you are already deciding that the posterior cannot cover negative values, which in may scenarios could be a constraint too strict

ldeschamps · January 30, 2020, 2:26pm

Hello!

I am constantly facing a mixing problem with a kind of “exploratory factor analysis”. I am trying to find the latent variables structuring the covariance of a small (31x16) dataset of count data, with many zero.

Here is my dataset. Y.csv (1.3 KB)

I tried to simplify the model as much as possible and finished with the following code.

data{
  int N; // sample size
  int S; // number of species
  int D; // number of factors
  int<lower=0> Y[N,S]; // data matrix of order [N,S]
  
  int<lower=0,upper=1> prior_only; //Should we estimate the likelihood?
}
transformed data{
  int<lower=1> M;
  vector[D] FS_mu; // factor means
  vector<lower=0>[D] FS_sd; // factor standard deviations
  M = D*(S-D)+ D*(D-1)/2;  // number of lower-triangular, non-zero loadings
  for (m in 1:D) {
    FS_mu[m] = 0; //Mean of factors = 0
    FS_sd[m] = 1; //Sd of factors = 1
  }
}
parameters{
  //Parameters
  vector[M] L_lower; //Lower diagonal loadings
  vector<lower=0>[D] L_diag; //Positive diagonal elements of loadings
  matrix[N,D] FS; //Factor scores, matrix of order [N,D]
  cholesky_factor_corr[D] Rho; //Correlation matrix between factors
}
transformed parameters{
  // Final parameters
  cholesky_factor_cov[S,D] L; //Final matrix of laodings
  matrix[D,D] Ld; //Cholesky decomposition of the covariance matrix between factors
  
  //Predictors
  matrix[N,S] Ups; //intermediate predictor
  matrix[N,S] Mu; //predictor
  
  // Correlation matrix of factors
  Ld = diag_pre_multiply(FS_sd, Rho); //Fs_sd fixed to 1, Rho estimated
  
  {
    int idx2; //Index for the lower diagonal loadings
    idx2 = 0;
    
    // Constraints to allow identifiability of loadings
  	 for(i in 1:(D-1)) { for(j in (i+1):(D)){ L[i,j] = 0; } } //0 on upper diagonal
  	 for(i in 1:D) L[i,i] = L_diag[i]; //Positive values on diagonal
  	 for(j in 1:D) {
  	   for(i in (j+1):S) {
  	     idx2 = idx2+1;
  	     L[i,j] = L_lower[idx2]; //Insert lower diagonal values in loadings matrix
  	   }
  	 }
  }
  
  // Predictors
  Ups = FS * L';
  for(n in 1:N) Mu[n,] = Ups[n,];
  
}
model{
  // Priors
  L_lower ~ student_t(7,0,0.5); //Weakly informative prior for lower diag loadings
  L_diag ~ student_t(7,0,0.5);//Weakly informative prior for diag loadings
  Rho ~ lkj_corr_cholesky(2); //Weakly informative prior for Rho
  
  for(i in 1:N){	
    if(prior_only == 0) Y[i,] ~ poisson_log(Mu[i,]);
    FS[i,] ~ multi_normal_cholesky(FS_mu, Ld);
  }
}
generated quantities{
  matrix[S,S] cov_L;
  matrix[S,S] cor_L;
  matrix[N,S] Y_pred;
  matrix[N,S] log_lik1;
  vector[N*S] log_lik;
  
  cov_L = multiply_lower_tri_self_transpose(L); //Create the covariance matrix
  
  // Compute the correlation matrix from de covariance matrix
  for(i in 1:S){
    for(j in 1:S){
      cor_L[i,j] = cov_L[i,j]/sqrt(cov_L[i,i]*cov_L[j,j]);
    }
  }
  
  //Compute the likelihood and predictions for each observation
  for(n in 1:N){
    for(s in 1:S){
      log_lik1[n,s] = poisson_log_lpmf(Y[n,s] | Mu[n,s]);
      Y_pred[n,s] = poisson_log_rng(Mu[n,s]);
    }
  }
  log_lik = to_vector(log_lik1); //Tranform in a vector usable by loo package
}

I thought this model would work without any trouble because a more complex version of it worked well on simulated data. In the current model, I removed the intercept and hierarchical parts. The zero upper-diag and the positive diagonal loadings seemed to be sufficient to allow an efficient sampling in case of simulated data.

However, I do not manage to get a proper sampling with my real data set. Pehaps because of the abundance of zero? They happens because the studied sites are highly different.

Would you have any suggestion? I was wandering if I should try your “change a signe” technique, maybe in reference to the sign of the diagonal of the unconstrained loadings?

Thank you very much, and have a good day!
Lucas

ldeschamps · January 30, 2020, 2:31pm

And here is the code to run the model

## Compile the model
GBFM_mod <- stan_model(
  "Markdown/1_SEM_Teabags/Poisson_GBFM_Reparam.stan")

## Prepare stan data
GBFM_data <- list(
  Y = Y,
  N = nrow(Y),
  S = ncol(Y),
  D = 2,
  prior_only = 0)

# Sample from the posterior
GBFM_fit_3 <- sampling(GBFM_mod, data = GBFM_data,
                       iter = 2000,
                       cores = 3, chains = 3)

ldeschamps · January 30, 2020, 3:15pm

I tried with a hurdle version, but nothing changed, the trouble seems really to identify the loadings.

data{
  int N; // sample size
  int S; // number of species
  int D; // number of factors
  int<lower=0> Y[N,S]; // data matrix of order [N,S]
  
  int<lower=0,upper=1> prior_only; //Should we estimate the likelihood?
}
transformed data{
  //Hyperparameters of factor scores
  int<lower=1> M;
  vector[D] FS_mu; // factor means
  vector<lower=0>[D] FS_sd; // factor standard deviations
  
  //Factors
  M = D*(S-D)+ D*(D-1)/2;  // number of lower-triangular, non-zero loadings
  for (m in 1:D) {
    FS_mu[m] = 0; //Mean of factors = 0
    FS_sd[m] = 1; //Sd of factors = 1
  }
}
parameters{
  //Parameters
  vector[M] L_lower; //Lower diagonal loadings
  vector<lower=0>[D] L_diag; //Positive diagonal elements of loadings
  matrix[N,D] FS; //Factor scores, matrix of order [N,D]
  cholesky_factor_corr[D] Rho; //Correlation matrix between factors
  real tau; //Shape of the zero counts
}
transformed parameters{
  // Final parameters
  cholesky_factor_cov[S,D] L; //Final matrix of laodings
  matrix[D,D] Ld; //Cholesky decomposition of the covariance matrix between factors
  
  //Predictors
  matrix[N,S] Ups; //intermediate predictor
  matrix[N,S] Theta; //predictor for 0 or 1
  matrix[N,S] Mu; //Predictor for counts
  
  // Correlation matrix of factors
  Ld = diag_pre_multiply(FS_sd, Rho); //Fs_sd fixed to 1, Rho estimated
  
  {
    int idx2; //Index for the lower diagonal loadings
    idx2 = 0;
    
    // Constraints to allow identifiability of loadings
  	 for(i in 1:(D-1)) { for(j in (i+1):(D)){ L[i,j] = 0; } } //0 on upper diagonal
  	 for(i in 1:D) L[i,i] = L_diag[i]; //Positive values on diagonal
  	 for(j in 1:D) {
  	   for(i in (j+1):S) {
  	     idx2 = idx2+1;
  	     L[i,j] = L_lower[idx2]; //Insert lower diagonal values in loadings matrix
  	   }
  	 }
  }
  
  // Predictors
  Ups = FS * L';
  for(n in 1:N) {
    Mu[n,] = exp(Ups[n,]);
    Theta[n,] = inv_logit(tau * Ups[n,]);
  }
  
}
model{
  // Priors
  L_lower ~ student_t(7,0,0.5); //Weakly informative prior for lower diag loadings
  L_diag ~ student_t(7,0,0.5);//Weakly informative prior for diag loadings
  Rho ~ lkj_corr_cholesky(2); //Weakly informative prior for Rho
  tau ~ student_t(7,0,1); 
  
  for(n in 1:N){
    for(s in 1:S){
      if(prior_only == 0){
      if (Y[n,s] == 0)
        target += log(Theta[n,s]);
      else
        target += log1m(Theta[n,s]) + poisson_lpmf(Y[n,s] | Mu[n,s])
                - log1m_exp(-Mu[n,s]);
      }
    }
    FS[n,] ~ multi_normal_cholesky(FS_mu, Ld);
  }
}
generated quantities{
  // matrix[S,S] cov_L;
  // matrix[S,S] cor_L;
  // matrix[N,S] Y_pred;
  // matrix[N,S] log_lik1;
  // vector[N*S] log_lik;
  // 
  // cov_L = multiply_lower_tri_self_transpose(L); //Create the covariance matrix
  // 
  // // Compute the correlation matrix from de covariance matrix
  // for(i in 1:S){
  //   for(j in 1:S){
  //     cor_L[i,j] = cov_L[i,j]/sqrt(cov_L[i,i]*cov_L[j,j]);
  //   }
  // }
  // 
  // //Compute the likelihood and predictions for each observation
  // for(n in 1:N){
  //   for(s in 1:S){
  //     log_lik1[n,s] = poisson_log_lpmf(Y[n,s] | Mu[n,s]);
  //     Y_pred[n,s] = poisson_log_rng(Mu[n,s]);
  //   }
  // }
  // log_lik = to_vector(log_lik1); //Tranform in a vector usable by loo package
}

EDIT : The resulting traceplots

Topic		Replies	Views
No convergence - mediation with latent variables in rstan Modeling rstan , fitting-issues	2	435	April 11, 2022
ODE non-converging; Rhat beyond 1 Modeling	47	1067	October 4, 2022
BCCA_exponential family model Algorithms	2	847	February 14, 2018
Convergence issues in simple latent variable model Modeling fitting-issues	1	844	February 7, 2020
Identification in latent variable model Modeling	5	1598	April 17, 2018

Non-convergence of latent variable model

Related topics