Of course! Here’s the simple case with no covariates:
data {
int<lower=1> N; //Number of samples
int<lower=1> S; //Number of species
int<lower=1> D; //Number of latent dimensions
array[N, S] int Y; //Species matrix
}
transformed data{
// Number of non-zero lower triangular factor loadings
// Ensures identifiability of the model - no rotation of factors
int<lower=1> M;
M = D * (S - D) + D * (D - 1) / 2 + D;
}
parameters {
// Site intercepts
real a_bar;
real<lower=0> sigma_a;
vector[N] a;
// Species intercepts
real<lower=0> sigma_b0;
vector[S] b0;
// Factor parameters
vector[M] L; // lower triangle of species loadings
real<lower=0> sigma_L; // variance of species loadings
// Latent variables
matrix[D, N] LV_uncor; // Per-site latent variable
// NegBin parameters
real<lower=0> kappa;
}
transformed parameters {
// Construct factor loading matrix
matrix[S, D] Lambda_uncor;
// Constraints to allow identifiability of loadings
for (i in 1:(D-1)) {
for (j in (i+1):(D)){
Lambda_uncor[i,j] = 0;
}
}
{
int index;
index = 0;
for (j in 1:D) {
for (i in j:S) {
index = index + 1;
Lambda_uncor[i, j] = L[index];
}
}
}
}
model {
// Factor priors
to_vector(LV_uncor) ~ std_normal();
L ~ std_normal();
// Random effect priors
a ~ std_normal();
b0 ~ std_normal();
a_bar ~ std_normal();
sigma_a ~ exponential(1);
sigma_b0 ~ exponential(1);
sigma_L ~ exponential(1);
// Negative Binomial scale parameter
kappa ~ exponential(1);
array[N] vector[S] mu;
for (i in 1:N) {
mu[i,] = a_bar + a[i] * sigma_a + b0 * sigma_b0 + (Lambda_uncor * sigma_L) * LV_uncor[,i];
Y[i,] ~ neg_binomial_2_log(mu[i, ], kappa);
}
}
generated quantities {
// Sign correct factor loadings and factors
matrix[D, N] LV;
matrix[S, D] Lambda;
for(d in 1:D){
if(Lambda_uncor[d,d] < 0){
Lambda[,d] = -1 * Lambda_uncor[,d];
LV[d,] = -1 * LV_uncor[d,];
} else {
Lambda[,d] = Lambda_uncor[,d];
LV[d,] = LV_uncor[d,];
}
}
// Calculate species covariance matrix
matrix[S, S] COV;
COV = multiply_lower_tri_self_transpose(Lambda);
}
Since there are no constraints on the diagonal of the factor loadings in this version, I’m sure that the code in the transformed parameters block can be simplified for a very minor time saving and/or clarity, but I haven’t had time to play with it as I’ve been busy in the lab for the past few weeks.
edit: I went ahead and modified the code for clarity. I also wanted to add that I’m really glad these models are tractable in Stan - the two major implementations of these models available to ecologists feel a bit lacking. The gllvm package is frequentist, with all that entails, and boral writes its model code in JAGS and then runs only one MCMC chain for precisely the identifiability issues noted above. The ability to run multiple chains, alongside the diagnostics from HMC, increases my confidence in making inferences from these models a lot.
In future, it might be interesting to write an R package that runs these models in Stan, or to see if they can be implemented in brms, to bring these more robust diagnostics to ecologists who are likely to overlook these issues.