Multiple modes in latent variable model -- identifiability, or something else?

Hi everyone,

I feel a little bad for opening another thread about this when there are already some similar ones, but hopefully my question is sufficiently different to be interesting.

I am fitting the model below:

data {
  int n; // number of sites
  int k; // number of species
  int p; // number of predictors
  int l; // number of latents

  int y[n, k]; // number of observations for each species
  matrix[n, p] X; // design matrix of predictors
}
transformed data {
  int flattened_y[n*k] = to_array_1d(y);
}
parameters {
  matrix[p, k] beta_1; // species response to predictors
  cholesky_factor_cov[k, l] beta_z; // species response to latents
  matrix[n, l] latents; // latents at each site
}
model {
  matrix[n, k] log_lambda;
  log_lambda = X * beta_1 + latents * beta_z';

  to_vector(beta_1) ~ normal(0, 1);
  to_vector(beta_z) ~ normal(0, 1);

  to_vector(latents) ~ normal(0, 1);
  flattened_y ~ poisson_log(to_vector(log_lambda'));
}

The way I like to think about this model is that each site latent has a multivariate normal prior centred on the linear prediction:

\pmb{y_i} \sim Poisson(\pmb{\theta_i})

log(\pmb{\theta_i}) \sim N(\pmb{B}^{T} \pmb{x_i}, \pmb{\Lambda} \pmb{\Lambda}^{T})

where \pmb{y_i} is the vector of (species) counts at site i, \pmb{B} is a matrix of environmental predictors, and \pmb{\Lambda} is a lower triangular matrix with positive values on the diagonal.

The thing that makes this model latent is that l < k, that is, we are making a low-rank approximation to the covariance matrix.

When I set l = 4 with ~50 species, the model converges fine on the environmental variables, with all \hat{R} looking good. However, some of the \pmb{\Lambda} entries have \hat{R} much greater than 1.1.

It makes sense to me that the “loadings” are only unique up to an orthogonal factor, since the resulting covariance matrix and thus the likelihood would be identical. What surprised me however was that the resulting covariance matrix \pmb{\Lambda}\pmb{\Lambda}^T also had not converged, with some entries looking multi-modal.

Out of interest, I fit a full-rank covariance matrix by setting l = k, and that seemed to converge fine. My best guess of what’s going on is that there are actually multiple plausible covariance matrices at l = 4, and the sampler may switch between them. They do not have the same likelihood, but they are sufficiently close to be plausible. Is this something people have observed, or is there something wrong with my model / thinking about identifiability?

Appreciate your thoughts.

Thanks,
Martin