Identifiability of GLLVM (factor analytic model)

Hi all,

I’ve been trying to implement a generalised linear latent variable model (GLLVM, see Warton et al. 2015), to analyse multivariate data corresponding to species communities. The model is as follows, for species j at site i:

y_{ij} \sim NegBinomial(exp(m_{ij}), \kappa) \\ m_{ij} = a_i + \beta_{0j} + \textbf{x}_i'\beta_j + u_{ij} \\

where

u_{ij} = \textbf{z}_i'\lambda_j \\ \textbf{z}_i \sim Normal(0, \textbf{I})

Where g() is a link function, a_i is a per-site intercept, \beta_{0j} is a species-specific intercept, \textbf{x}_i' is a vector of site-level covariates, and \beta_j is a vector of species-specific coefficients for these covariates. Finally, u_{ij} are latent variables \textbf{z}_i and their factor loadings \lambda_j. The goal of the model is twofold: the latent variables \textbf{z}_i can be used to construct a contstrained or unconstrained ordination, and residual correlation between species can be reconstructed by calculating \Sigma = \Lambda\Lambda^T.

Following this post by @rfarouni (Bayesian Factor Analysis), I have a working implementation of the model (leaving out the species-specific responses \textbf{x}_i'\beta_j for simplicity), which constrains the factor loadings such that all upper-triangular elements are zero, and the diagonal elements to be positive only. My understanding is that these constraints aid in identifiability by reducing the number of possible latent variable solutions (e.g. different orderings/signs of the factor loadings) to one. The post also recommends setting the init values for each chain to be close to one another, so I set all initial values to 0 using init = 0. This approach appears to work when using dataset simulated from the priors, finding a single solution, but in applying it to a real dataset (the spiders dataset from the R package mvabund), different chains appear to converge on different solutions for the latent variables even with these constraints:

Despite finding different solutions for the values of \textbf{z}_i and \lambda_j, the linear predictions produced by each chain are all largely consistent (note that the model does fit the observed data terribly, likely due to the lack of environmental predictors):

I’m new to the world of factor analysis (and no mathematician), but my understanding is that there are in principle an infinite number of solutions for the latent variables and their loadings. From the perspective of constructing the species residual covariance matrix, the multiple solutions here pose little issue as far as I can tell, as long as the solutions to \Lambda\Lambda^T are similar enough, but in terms of using this as an ordination technique this is obviously problematic! Is it possible to improve the identifiability/reduce the number of possible latent variable solutions further, or is this likely to be as good as it gets? If so, the only solution I can think of for consistent ordination would be to run many more parallel chains and select a set that produce the same solution for further analysis, but this seems rather arbitrary and for the more complex models I’d like to build, inefficient and wasteful of wall time.

Model code is as follows:

data {
  int<lower=1> N; //Number of samples
  int<lower=1> S; //Number of species
  int<lower=1> D; //Number of latent dimensions

  array[N, S] int Y; //Species matrix
}
transformed data{
  // Number of non-zero lower triangular factor loadings
  // Ensures identifiability of the model
  int<lower=1> M; 
  M = D * (S - D) + D * (D - 1) / 2;
}
parameters {
  // Site intercepts
  vector[N] a;
  real a_bar;
  real<lower=0> sigma_a;
  
  // Species intercepts
  vector[S] b0;
  real<lower=0> sigma_b0;
  
  // Factor parameters
  vector[M] L_lower; // lower triangle of species loadings
  vector<lower=0>[D] L_diag; // Diagonal of species loadings - constrained to positive
  matrix[N, D] LV; // Per-site latent variable

  // NegBin parameters
  real<lower=0> kappa;
}
transformed parameters {
  matrix[S, D] L;
  
  // Assign parameters to L matrix:
  {
    int idx2; // Index for the lower diagonal loadings
    idx2 = 0;

    // Constraints to allow identifiability of loadings
  	 for (i in 1:(D-1)) { for (j in (i+1):(D)){ L[i,j] = 0; } } // 0 on upper diagonal
  	 for (i in 1:D) L[i,i] = L_diag[i]; // Positive values on diagonal
  	 for (j in 1:D) {
  	   for (i in (j+1):S) {
  	     idx2 = idx2+1;
  	     L[i,j] = L_lower[idx2];
  	   }
  	 }
  }
}
model {
  // Factor priors
  to_vector(LV) ~ std_normal();
  L_lower ~ std_normal();
  L_diag ~ std_normal();
  
  // Random effect priors
  a ~ normal(0, 1);
  b0 ~ normal(0, 1);
  a_bar ~ normal(0, 1);
  sigma_a ~ exponential(1);
  sigma_b0 ~ exponential(1);
  
  // NegBin kappa prior
  kappa ~ exponential(1);
  
  array[N] vector[S] mu;
  for (i in 1:N) {
    for(j in 1:S) {
      mu[i, j] = exp(a_bar + a[i] * sigma_a + b0[j] * sigma_b0 + 
                      dot_product(L[j,], LV[i,]));
      Y[i,j] ~ neg_binomial_2(mu[i, j], kappa);
    }
  }
}
generated quantities {
  array[N] vector[S] linpred;
  for (i in 1:N) {
    for(j in 1:S) {
      linpred[i, j] = a_bar + a[i] * sigma_a + b0[j] * sigma_b0 + dot_product(L[j,], LV[i,]);
    }
  }
  matrix[S, S] COV;
  COV = multiply_lower_tri_self_transpose(L);
}

Function to simulate dataset:

generate_sim_data <- function(N, S, D){
  a_bar <- rnorm(1, 0, 1)
  sigma_a <- rexp(1, 1)
  sigma_b0 <- rexp(1, 1)
  a <- rnorm(N, mean = 0, sd = 1)
  b0 <- rnorm(S, mean = 0, sd = 1)
  M <- D * (S - D) + D * (D - 1) / 2

  L_l <- rnorm(M, 0, 1)
  L_d <- abs(rnorm(D, 0, 1))
  L <- matrix(nrow = S, ncol = D)
  
  kappa <- rexp(1, 1)
  
  idx2 = 0;
  for (i in 1:(D-1)) { for (j in (i+1):(D)){ L[i,j] = 0 } }
  for (i in 1:D) L[i,i] = L_d[i]
  for (j in 1:D) {
    for (i in (j+1):S) {
      idx2 = idx2+1
      L[i,j] = L_l[idx2]
    }
  }

  LV <- matrix(rnorm(N * D, 0, 1), nrow = N, ncol = D)
  
  Y <-  matrix(nrow = N, ncol = S)
  for(i in 1:N) {
    for(j in 1:S){
      mu_ij <- a_bar + a[i] * sigma_a + b0 * sigma_b0 + dot(LV[i,], L[j,])
      Y[i, j] <- rethinking::rgampois(1, mu = exp(mu_ij), scale = kappa)
    }
  }
  
  corr <- cov2cor(L %*% t(L))
  
  pars =  list(a_bar = a_bar,
               sigma_a = sigma_a,
               sigma_b0 = sigma_b0,
               a = a, b0 = b0,
               L = L,
               LV = LV, Y = Y)
  
  return(list(Y = Y, corr = corr, pars = pars))
}

Simulated dataset used above is here: sim_data.csv (1.0 KB)

1 Like

After some trawling through the literature, I found a solution (well, a pair of similar solutions), so I thought I would put an update here in case someone has the same problem as me somewhere in the future. I’m sure my explanation below is limited in it’s technical breadth, but hopefully it’s useful to someone.

The problem, as I’ve learned, is that the factor loadings are only identifiable down to \Lambda\Lambda^T, and so different MCMC chains can easily end up exploring different modes. Constraints on the values of \Lambda can help with identifiability - in particular, constraining the upper diagonal to zeroes forces the modes found to differ only by sign, whereas having no constraints means that the modes can also be rotations of one another. Additional constraints, such as constraining the diagonals to be positive, have been claimed to force a single signed mode, but for reasons far past my understanding can find non-trivial modes such as the ones above.

The solution(s) I have found involve post-processing the posterior draws in order to choose a single of the explored modes. The first solution (Erosheva and Curtis 2017, Dealing with reflection invariance in Bayesian factor analysis) uses the upper-diagonal=0 constraint, and returns a draws x dimension set of sign changes. The second solution (Papastamoulis and Ntzoufras 2020, [2004.05105] On the identifiability of Bayesian factor analytic models) uses no constraints on \Lambda and involves a varimax rotation step before calculating sign changes.

As I am only interested in the values of \Lambda insofar as \Lambda\Lambda^T, but do wish to interpret the values of the latent variables as an ordination, the first solution is more convenient for me as it only requires applying the sign changes to the latent variables, rather than also extracting and applying the same rotation to them. Applying this solution drops the Rhat values from somewhere in the region of 1.8 to 1 to 1.01.


I have a couple of simple follow-up questions that I haven’t quite been able to figure out that would help apply this post-processing step more easily in a workflow:

  1. Is it possible to modify the draws held within a CmdStanMCMC object (ideally in a copy)?
  2. Is it possible to extract the dimensions of \Lambda from a CmdStanMCMC object? I would rather not have to regex these out of the parameter names if possible!
1 Like

Hi, I great that you were able to find a solution, and I really enjoyed reading your explanation, it will certainly be useful for others.

Is there a reason you would like to keep them within that object? In theory you could write the modified draws back to csv, but I don’t it is directly supported so there might be something that fails. How about modying them in a posterior object?

I recommend to check out the rvars type in the posterior package rvar: The Random Variable Datatype • posterior. Then you can just do dim(draws$Lambda)

2 Likes

For dealing with reflection invariance like this, I’ve also had a good deal of success in the past using the sign-correction approach from this post: Non-convergence of latent variable model - #13 by Mauricio_Garnier-Villarre

2 Likes

Hello, May I ask a basic question about your model? Could you please tell me the interpretation for u[i,j] as well as z[i] and lambda[j] in your context? Thank you.

The notation here follows Box 1 of the Warton et al paper linked in the original post. That paper is the best explanation that I’ve seen of this model for an ecological audience. If you’re unable to access the paper, DM me your email address and I’ll send you a copy.

2 Likes

Many thanks! I have a lot to learn (and I do want to learn!), but I always find it useful to try and distil my understanding to retain the key conceptual points even if I don’t quite follow the mathematical details.

Yes, this is definitely a better idea - I was writing functions that take a whole model object for downstream inference and plotting (hence my desire to modify the object), but quickly realised I’m better off just storing my modified draws elsewhere and using them as needed.

Genius - thinking about it, I can probably also output my vectors of sign changes as a multi-column Rvar, which would make applying them downstream very straightforward.

Thanks for this - I searched the forums before making this post but somehow didn’t stumble across this one! Interesting to see that sign-correction can work within the GQ block - I would have assumed that this would be infeasible as the “global” solution cannot be shared between chains.

2 Likes

The sign change approcha works because the model is identified, and presents a bi modal distribution. In every factor analysis model, the identification method is arbitrary and the same happens with the sign direction. So, we dont need to “know” what is the “correct” direction. You choose in which direction you want the factors to be and the model will adjust to it, allwoing to have negative values etc.
Think of the sign direction as part of the identification constraints
Be careful that you need to sign-correct factor loadings, factor correlations, factor regressions. (I might be forgetting some other parameter type depending on your model)

On another note, you can also build the model with data augmentation approach, and change the regression between factors and items (factor loadings) to a negative binomial instead of a linear regression

2 Likes

That makes sense, thanks! I implemented this today and it works a treat, and more than happy to not rely on a post-processing step.

I do wonder now, though - if it’s really as straightforward as choosing a direction for each factor, why are people designing complex iterative algorithms to solve the same problem?

Glad this worked!

I havent play with these approaches. But do know the sign correction doesnt work with every case, for example its more complicated for latent class models. These iterative methods might help in some of these scenarios

Also, for general SEM, i contribute for the package blavaan, which adds a lot of extra functionality. It does not wor for negative binomial. But works well for continuous items, and categorical indicators will be added soon

2 Likes

I just want to thank @prototaxites and everyone here for a really clear discussion of these models! GLLVMs are interesting to lots of ecologists and this is sure to be a frequently-visited post :)
@prototaxites , I was wondering if you’d be willing to post your version including @Mauricio_Garnier-Villarre 's sign-correction step? It’s sure to be useful to others (eg: me) trying to follow the same approach!

1 Like

Of course! Here’s the simple case with no covariates:

data {
  int<lower=1> N; //Number of samples
  int<lower=1> S; //Number of species
  int<lower=1> D; //Number of latent dimensions

  array[N, S] int Y; //Species matrix
}
transformed data{
  // Number of non-zero lower triangular factor loadings
  // Ensures identifiability of the model - no rotation of factors
  int<lower=1> M; 
  M = D * (S - D) + D * (D - 1) / 2 + D;
}
parameters {
  // Site intercepts
  real a_bar;
  real<lower=0> sigma_a;
  vector[N] a;
  
  // Species intercepts
  real<lower=0> sigma_b0;
  vector[S] b0;
  
  // Factor parameters
  vector[M] L; // lower triangle of species loadings
  real<lower=0> sigma_L; // variance of species loadings
  
  // Latent variables
  matrix[D, N] LV_uncor; // Per-site latent variable

  // NegBin parameters
  real<lower=0> kappa;
}
transformed parameters {
  // Construct factor loading matrix
  matrix[S, D] Lambda_uncor;
  // Constraints to allow identifiability of loadings
  for (i in 1:(D-1)) { 
    for (j in (i+1):(D)){ 
      Lambda_uncor[i,j] = 0; 
    } 
  }
  {
    int index;
    index = 0;
    for (j in 1:D) {
      for (i in j:S) {
        index = index + 1;
        Lambda_uncor[i, j] = L[index];
      } 
    }
  }
}
model {
  // Factor priors
  to_vector(LV_uncor) ~ std_normal();
  L ~ std_normal();

  // Random effect priors
  a ~ std_normal();
  b0 ~ std_normal();
  a_bar ~ std_normal();
  sigma_a ~ exponential(1);
  sigma_b0 ~ exponential(1);
  sigma_L ~ exponential(1);
  
  // Negative Binomial scale parameter
  kappa ~ exponential(1);
  
  array[N] vector[S] mu;
  for (i in 1:N) {
      mu[i,] = a_bar + a[i] * sigma_a + b0 * sigma_b0 + (Lambda_uncor * sigma_L) * LV_uncor[,i];
      Y[i,] ~ neg_binomial_2_log(mu[i, ], kappa);
  }
}
generated quantities {
  // Sign correct factor loadings and factors
  matrix[D, N] LV;
  matrix[S, D] Lambda;
  for(d in 1:D){
    if(Lambda_uncor[d,d] < 0){
      Lambda[,d] = -1 * Lambda_uncor[,d];
      LV[d,] = -1 * LV_uncor[d,];
    } else {
      Lambda[,d] = Lambda_uncor[,d];
      LV[d,] = LV_uncor[d,];
    }
  }
  
  // Calculate species covariance matrix
  matrix[S, S] COV;
  COV = multiply_lower_tri_self_transpose(Lambda);
}

Since there are no constraints on the diagonal of the factor loadings in this version, I’m sure that the code in the transformed parameters block can be simplified for a very minor time saving and/or clarity, but I haven’t had time to play with it as I’ve been busy in the lab for the past few weeks.

edit: I went ahead and modified the code for clarity. I also wanted to add that I’m really glad these models are tractable in Stan - the two major implementations of these models available to ecologists feel a bit lacking. The gllvm package is frequentist, with all that entails, and boral writes its model code in JAGS and then runs only one MCMC chain for precisely the identifiability issues noted above. The ability to run multiple chains, alongside the diagnostics from HMC, increases my confidence in making inferences from these models a lot.

In future, it might be interesting to write an R package that runs these models in Stan, or to see if they can be implemented in brms, to bring these more robust diagnostics to ecologists who are likely to overlook these issues.

6 Likes

Thank you so much @prototaxites ! and I couldn’t agree more with your assessment of the current tools around these kinds of models. Also, my read of the room is that any ecologist willing to go as far as fitting one of these complex, subtle models is unlikely to be satisfied with somebody else’s canned procedure. Breaking these models out of bespoke packages and into clear Stan code is a service to the community!

2 Likes