Identifiability of GLLVM (factor analytic model)

Of course! Here’s the simple case with no covariates:

data {
  int<lower=1> N; //Number of samples
  int<lower=1> S; //Number of species
  int<lower=1> D; //Number of latent dimensions

  array[N, S] int Y; //Species matrix
}
transformed data{
  // Number of non-zero lower triangular factor loadings
  // Ensures identifiability of the model - no rotation of factors
  int<lower=1> M; 
  M = D * (S - D) + D * (D - 1) / 2 + D;
}
parameters {
  // Site intercepts
  real a_bar;
  real<lower=0> sigma_a;
  vector[N] a;
  
  // Species intercepts
  real<lower=0> sigma_b0;
  vector[S] b0;
  
  // Factor parameters
  vector[M] L; // lower triangle of species loadings
  real<lower=0> sigma_L; // variance of species loadings
  
  // Latent variables
  matrix[D, N] LV_uncor; // Per-site latent variable

  // NegBin parameters
  real<lower=0> kappa;
}
transformed parameters {
  // Construct factor loading matrix
  matrix[S, D] Lambda_uncor;
  // Constraints to allow identifiability of loadings
  for (i in 1:(D-1)) { 
    for (j in (i+1):(D)){ 
      Lambda_uncor[i,j] = 0; 
    } 
  }
  {
    int index;
    index = 0;
    for (j in 1:D) {
      for (i in j:S) {
        index = index + 1;
        Lambda_uncor[i, j] = L[index];
      } 
    }
  }
}
model {
  // Factor priors
  to_vector(LV_uncor) ~ std_normal();
  L ~ std_normal();

  // Random effect priors
  a ~ std_normal();
  b0 ~ std_normal();
  a_bar ~ std_normal();
  sigma_a ~ exponential(1);
  sigma_b0 ~ exponential(1);
  sigma_L ~ exponential(1);
  
  // Negative Binomial scale parameter
  kappa ~ exponential(1);
  
  array[N] vector[S] mu;
  for (i in 1:N) {
      mu[i,] = a_bar + a[i] * sigma_a + b0 * sigma_b0 + (Lambda_uncor * sigma_L) * LV_uncor[,i];
      Y[i,] ~ neg_binomial_2_log(mu[i, ], kappa);
  }
}
generated quantities {
  // Sign correct factor loadings and factors
  matrix[D, N] LV;
  matrix[S, D] Lambda;
  for(d in 1:D){
    if(Lambda_uncor[d,d] < 0){
      Lambda[,d] = -1 * Lambda_uncor[,d];
      LV[d,] = -1 * LV_uncor[d,];
    } else {
      Lambda[,d] = Lambda_uncor[,d];
      LV[d,] = LV_uncor[d,];
    }
  }
  
  // Calculate species covariance matrix
  matrix[S, S] COV;
  COV = multiply_lower_tri_self_transpose(Lambda);
}

Since there are no constraints on the diagonal of the factor loadings in this version, I’m sure that the code in the transformed parameters block can be simplified for a very minor time saving and/or clarity, but I haven’t had time to play with it as I’ve been busy in the lab for the past few weeks.

edit: I went ahead and modified the code for clarity. I also wanted to add that I’m really glad these models are tractable in Stan - the two major implementations of these models available to ecologists feel a bit lacking. The gllvm package is frequentist, with all that entails, and boral writes its model code in JAGS and then runs only one MCMC chain for precisely the identifiability issues noted above. The ability to run multiple chains, alongside the diagnostics from HMC, increases my confidence in making inferences from these models a lot.

In future, it might be interesting to write an R package that runs these models in Stan, or to see if they can be implemented in brms, to bring these more robust diagnostics to ecologists who are likely to overlook these issues.

7 Likes