Sparse infinite factor priors

Hello!

I tried last week to fit such a model (with some other parameters, like a hierarchical intercept per row), but keeping the usual constraints of factor analysis (0 on upper diag of loadings and positive diagonal of loading matrix). If I understand well, Bhattacharya & Dunson argue that identifiability is not necessary for most applications of factor models, like covariance estimation. Ohma, the problem with this approach is that we loose a lot of diagnostic power, which is important for very complicated/diffuse models like these ones.

Fitting the following model, I had a lot of divergent transitions and BFMI warnings. Moreover, the shrinkage were unsufficient : I simulated 3 non-zero loadings, but the model estimated at least 7 non-zero loading. However, the covariance were not too badly approximated.

We have to keep in mind a major difference between Gibbs sampler proposed by the authors and stan. The sampler described in the paper remove columns falling under a determined threshold, and reimplement them randomly to ensure their “zeroness”. This principle cannot be implemented in stan, and I guess we have to find an efficient trick to replace it.

I abandonned this approach and started wondering about implementing horseshoe-like priors for the loadings, but I don’t know what is the better way to increase the shrinkage intensity with column indices. I thought about sampling the parameter of the local shrinkage cauchy distribution from another cauchy distribution with mu = 1/d, d being the column index, but I doubt such a rude method could work. @avehtari could certainly have some ideas!

data{
  int N; // sample size
  int S; // number of species
  int D; // number of factors
  int<lower=0> Y[N,S]; // data matrix of order [N,S]
}
transformed data{
  int<lower=1> M;
  vector[D] FS_mu; // factor means
  vector<lower=0>[D] FS_sd; // factor standard deviations
  M = D*(S-D)+ D*(D-1)/2;  // number of lower-triangular, non-zero loadings
  for (m in 1:D) {
    FS_mu[m] = 0; //Mean of factors = 0
    FS_sd[m] = 1; //Sd of factors = 1
  }
}
parameters{
  //Parameters
  real alpha; //Global intercept
  vector[N] d0_raw; //Uncentered row intercepts
  vector[M] L_lower; //Centered lower diagonal loadings
  vector<lower=0>[D] L_diag; //Centered positive diagonal elements of loadings
  matrix[N,D] FS; //Factor scores, matrix of order [N,D]
  cholesky_factor_corr[D] Rho; //Correlation matrix between factors
  real<lower=2> a1; //Hyperprior for first delta
  real<lower=3> a2; //Hyperprior for other deltas
  //Hyperparameters
  real<lower=0, upper=pi()/2> sigma_d_unif; //Uncentered sd of the row intercepts
  //Shrinkage parameters
  vector<lower=0>[M+D] phi_jh; //Local shrinkage parameter
  vector<lower=0>[D] delta_j; //Elements of the factor specific shrinkage priors
}
transformed parameters{
  // Final parameters
  vector[N] d0; //Final row intercepts
  cholesky_factor_cov[S,D] L; //Final matrix of laodings
  matrix[D,D] Ld; // cholesky decomposition of the covariance matrix between factors
  // Final hyperparameters
  real sigma_d; //Final sd of the row intercepts
  // Final shrinkage priors
  vector<lower=0>[D] tau_j; //Final factor specific shrinkage prior
  //Predictors
  matrix[N,S] Ups; //intermediate predictor
  matrix<lower=0>[N,S] Mu; //predictor
  
  // Compute the final hyperparameters
  sigma_d = 0 + 1 * tan(sigma_d_unif); //sigma_d ~ Halfcauchy(0,2.5)
  
  //Compute the final parameters
  d0 = 0 + sigma_d * d0_raw; //do ~ Normal(0, sigma_d)
  
  //Compute final global shrinkage prior
  for(d in 1:D) tau_j[d] = prod(delta_j[1:d]); //Compute the product of gamma distributed deltas
  
  // Correlation matrix of factors
  Ld = diag_pre_multiply(FS_sd, Rho); //Fs_sd fixed to 1, Rho estimated
  
  {
    int idx2; //Index for the lower diagonal loadings
    idx2 = 0;
    
    // Constraints to allow identifiability of loadings
  	 for(i in 1:(D-1)) { for(j in (i+1):(D)){ L[i,j] = 0; } } //0 on upper diagonal
  	 for(i in 1:D) L[i,i] = L_diag[i]; //Positive values on diagonal
  	 for(j in 1:D) {
  	   for(i in (j+1):S) {
  	     idx2 = idx2+1;
  	     L[i,j] = L_lower[idx2]; //Insert lower diagonal values in loadings matrix
  	   }
  	 }
  }
  
  // Predictors
  Ups = FS * L';
  for(n in 1:N) Mu[n] = exp(alpha + d0[n] + Ups[n]);
  
}
model{
  // Uncentered hyperpriors : the sampling will be automaticaly done as if they were defined on an uniform distribution between 0 or -pi/2 and pi/2 (see constraints)
  
  //Shrinkage priors
  {
    int idx3;
    idx3 = 0;
    for(d in 1:D){
      // Shrinkage prior for lower diagonal loadings
      for(m in (idx3+1):(d*(S-d)+ d*(d-1)/2)){
        L_lower[m] ~ normal(0, (1/phi_jh[m])*(1/tau_j[d]));
        idx3 = (d*(S-d)+ d*(d-1)/2);
     }
     // Shrinkage prior for diagonal loadings
     target += (D - d) * log(L_diag[d]) - .5 * L_diag[d] ^ 2 /
     ( (1/phi_jh[M+d]) * (1/tau_j[d]) );
    }
  }

  //Shrinkage hyperpriors
  phi_jh ~ gamma(3/2, 3/2);
  delta_j[1] ~ gamma(3,1);
  delta_j[2:D] ~ gamma(4,1);
  a1 ~ gamma(2,1);
  a2 ~ gamma(2,1);

  // Priors
  alpha ~ student_t(3,0,5); //Weakly informative prior for global intercept
  d0_raw ~ normal(0,1); //Uncentered regularizing prior for row deviations
  Rho ~ lkj_corr_cholesky(1); //Uninformative prior for Rho
  
  //Compute the likelihood
  for(i in 1:N){	
    Y[i,] ~ poisson(Mu[i,]);
    FS[i,] ~ multi_normal_cholesky(FS_mu, Ld);
  }
}
generated quantities{
  matrix[S,S] cov_L;
  matrix[S,S] cor_L;
  matrix[N,S] Y_pred;
  matrix[N,S] log_lik1;
  vector[N*S] log_lik;
  
  cov_L = multiply_lower_tri_self_transpose(L); //Create the covariance matrix
  
  // Compute the correlation matrix from de covariance matrix
  for(i in 1:S){
    for(j in 1:S){
      cor_L[i,j] = cov_L[i,j]/sqrt(cov_L[i,i]*cov_L[j,j]);
    }
  }
  
  //Compute the likelihood and predictions for each observation
  for(n in 1:N){
    for(s in 1:S){
      log_lik1[n,s] = poisson_lpmf(Y[n,s] | Mu[n,s]);
      Y_pred[n,s] = poisson_rng(Mu[n,s]);
    }
  }
  log_lik = to_vector(log_lik1); //Tranform in a vector usable by loo package
}

The data generating process and a detailed explanation of a similar code can be found here. The code is a little bit complicated because lower diag loadings are stored in a vector, and we have to select the ones corresponding to each columns (loop trick in the model part). For loadings, I tried to use the prior proposed by Leung & Drton (2014). This prior should “maintain invariance property under reordering variables”

I would be happy to continue the discussion, especially if we are able to find a reliable solution!

Lucas