# Help with factor analysis (latent variable model)

Hello!

I am exploring latent variable model, as described in ecological papers (from Warton, Ovaskainen, Blanchet et al.). The principal aim is to model residual correlations among species abundances after controlling for environmental variables.

After reading the google group discussion and Rick Farouni’s blog post, I tried to reproduce a simple example from the boral package, which realize exactly that by samping posteriors with jags.

EDIT : I just remembered this discussion on discourse, and I realized I have done exactly the same thing. However, I thought I followed Rick’s recommendation, but it seems I encounter the same problems…

I tried different formulations, but I always encounter BFMI warning, and I can’t reproduce boral results.

Here is the stan code

``````data{
int N; // sample size
int P; // number of species
int D; // number of dimensions
int<lower=0> Y[N,P]; // data matrix of order [N,P]
}
transformed data{
int<lower=1> M;
}
parameters{
//Parameters
vector[N] alpha; //Row intercepts
row_vector[P] b0; // Intercept per species
matrix[N,D] FS; //Factor scores, matrix of order [N,D]
cholesky_factor_corr[D] Rho; //Correlation matrix between factors
//Hyperparameters
real<lower=0> sigma_a; //Variance of the row intercepts
real<lower=0> sigma_b0; //Variance of the species intercepts
real<lower=0> mu_lower;
real<lower=0> sigma_lower;
real<lower=0> eta; //Parameter of LKJ prior for Rho
}
transformed parameters{
cholesky_factor_cov[P,D] lambda; //Final matrix of laodings
vector[D] FS_mu; // factor means
vector<lower=0>[D] FS_sd; // factor standard deviations
matrix[D,D] Ld; // cholesky decomposition of the covariance matrix between factors
matrix[N,P] temp; //intermediate predictor
matrix<lower=0>[N,P] mu; // predicted values

for (m in 1:D) {
FS_mu[m] = 0; //Mean of factors = 0
FS_sd[m] = 1;} //Sd of factors = 1

// Correlation matrix of factors
Ld = diag_matrix(FS_sd) * Rho;
{
idx2 = 0;

for(i in 1:(D-1)) { for(j in (i+1):(D)){ lambda[i,j] = 0; } } //0 on upper diagonal
for(i in 1:D) lambda[i,i] = L_diag[i]; //Positive values on diagonal
for(j in 1:D) {
for(i in (j+1):P) {
idx2 = idx2+1;
lambda[i,j] = L_lower[idx2];
}
}
}

// Predictor
temp = FS * lambda';
for(n in 1:N) mu[n] = exp(alpha[n] + b0 + temp[n]);

}
model{
// Hyperpriors
sigma_a ~ gamma(2,0.1); //Row intercepts hyperprior
sigma_b0 ~ gamma(2,0.1); //Species intercept effects hyperprior
eta ~ gamma(2,0.1); //Parameter of the cholesky prior for FS correlation structure

// Priors
alpha ~ normal(0, sigma_a); //Regularizing prior for row intercepts
b0 ~ normal(0, sigma_b0); //Regularizing prior for species intercepts
L_diag ~ normal(0,0.1); //Prior for diagonal elements
Rho ~ lkj_corr_cholesky(eta); //Regularizing prior for Rho

for(i in 1:N){
Y[i,] ~ poisson(mu[i]);
FS[i,] ~ multi_normal_cholesky(FS_mu, Ld);
}
}
generated quantities{
//cov_matrix[P] cov_lambda;

//cov_lambda = lambda * lambda´
}
``````

And a R code producing the reproducible example.

``````rm(list = ls())

library(rstan)
library(boral) ## require jags
library(mvabund)

# We will use the spider dataset available in mvabund, pitfall trap counts of 12 hunting spider species at 28 sites. First load it from mvabund:
data(spider)

# Cherry-pick a few variables
spid.abund = spider\$abund[,c(1:3,7,8,12)]

## Prepare data for stan model
Y <- spid.abund
N <- nrow(Y)
P <- ncol(Y)
D <- 3

StanDat <- list(Y = Y, N = N, P = P, D = D)

## Fit the model with stan
lvm_stan <- stan("Simple_latent_matrix.stan", data = StanDat,
iter = 2000, chain = 3, cores = 3)
## Fit the model with boral
lvm_boral <- boral(y = Y, num.lv = D, family = "poisson", row.eff = "random")
``````

Any idea?

Thank you!
Lucas

2 Likes

First element of solution : I avoided BFMI problems and divergent transitions by remplacing the estimated mean of factors by 0.

I will check if I this code can recover parameters from a simulated data set.

However, I don’t understand why cov_matrix is not positive-definite (error if I define cov_lambda as a covariance matrix) : I made lambda having 0 on the upper-diagonal elements, and have positive constrained diagonal.

``````generated quantities{
matrix[P,P] cov_lambda;

cov_lambda = multiply_lower_tri_self_transpose(lambda);
}
``````

Not sure at all if this is the reason why you are encountering the problems you flag, but is it possible that the conventional Poisson model is not the most appropriate specfication here? Could there be, for example, excess zeroes and non-linearities in the data that need to be accommodated?

Sadykova, D., Scott, B.E., Dominicis, M.D., Wakelin, S.L., Sadykov, A., & Wolf, J. (2017). Bayesian joint models with INLA exploring marine mobile predator-prey and competitor species habitat overlap. Ecology and Evolution, 7(14), 212-5226. doi: 10.1002/ece3.3081

You are perfectly right, and a negative-binomial could be even sufficient.

However, I notice something : I have only one hyperparameter for every loadings of factor, would I not be better to have each factor’s loading with a unique variance?

I’m yet wandering how to code that efficiently, given that my non-zero non diagonal loadings are stored in a vector…

Nevermind, ignore this, I’m super wrong.

The eigenvalues of L would be positive, but not necessarily LL^T.

Check L =
[1, 0;
-5, 1]

Oh you’re right, I did not understand that! Thanks! Is there a way to constrain LL^T to be positive semi-definite??

Lol I don’t think I am. No paper no pencil, trying to think it out in my head, blech, mistakes were made.

You could do that, but it would seem to require a lot of background knowledge that may not be available.

I thought about that because of papers of Ovaskainen team (2016, 2017), which implement two very interesting features :

• Sparse infinite bayesian factor models. This approach cannot be directly implemented in stan, but could potentially be approached by shrinkage or marginalization, I think, even if I do not undestand the subtility of these subjects
• Determine underlying structure of factors (spatial in 2016 paper) or loadings (dependant of environmental variables in 2017 paper).

I would love to be able to implement that in stan language and exploit the powerful sampler it represents.

R tells me you are probably right :D

``````A <- matrix(c(1,0,-5,1), ncol = 2, byrow = T)
A
[,1] [,2]
[1,]    1    0
[2,]   -5    1
eigen(A)
eigen() decomposition
\$values
[1] 1 1

\$vectors
[,1]         [,2]
[1,]    0 4.440892e-17
[2,]    1 1.000000e+00

A %*%t(A)
[,1] [,2]
[1,]    1   -5
[2,]   -5   26
eigen(A%*%t(A))
eigen() decomposition
\$values
[1] 26.96291202  0.03708798

\$vectors
[,1]       [,2]
[1,] -0.1891075 -0.9819564
[2,]  0.9819564 -0.1891075``````
``````data{
int N; // sample size
int P; // number of species
int D; // number of dimensions
int<lower=0> Y[N,P]; // data matrix of order [N,P]
}
transformed data{
vector[D] FS_mu; // factor means
vector<lower=0>[D] FS_sd; // factor standard deviations
int<lower=1> M;
for (m in 1:D) {
FS_mu[m] = 0; //Mean of factors = 0
FS_sd[m] = 1;
} //Sd of factors = 1
}
parameters{
//Parameters
vector[N] alpha; //Row intercepts
row_vector[P] b0; // Intercept per species
matrix[N,D] FS; //Factor scores, matrix of order [N,D]
cholesky_factor_corr[D] Rho; //Correlation matrix between factors
//Hyperparameters
real<lower=0> sigma_a; //Variance of the row intercepts
real<lower=0> sigma_b0; //Variance of the species intercepts
real<lower=0> mu_lower;
real<lower=0> sigma_lower;
real<lower=0> eta; //Parameter of LKJ prior for Rho
}
transformed parameters{
cholesky_factor_cov[P,D] lambda; //Final matrix of laodings

matrix[D,D] Ld; // cholesky decomposition of the covariance matrix between factors
matrix[N,P] temp; //intermediate predictor
matrix<lower=0>[N,P] mu; // predicted values

// Correlation matrix of factors
Ld = diag_pre_multiply(FS_sd, Rho);
{
idx2 = 0;

for(i in 1:(D-1)) { for(j in (i+1):(D)){ lambda[i,j] = 0; } } //0 on upper diagonal
for(i in 1:D) lambda[i,i] = L_diag[i]; //Positive values on diagonal
for(j in 1:D) {
for(i in (j+1):P) {
idx2 = idx2+1;
lambda[i,j] = L_lower[idx2];
}
}
}

// Predictor
temp = FS * lambda';
for(n in 1:N) mu[n] = exp(alpha[n] + b0 + temp[n]);

}
model{
// Hyperpriors
sigma_a ~ gamma(2,0.1); //Row intercepts hyperprior
sigma_b0 ~ gamma(2,0.1); //Species intercept effects hyperprior
eta ~ gamma(2,0.1); //Parameter of the cholesky prior for FS correlation structure

// Priors
alpha ~ normal(0, sigma_a); //Regularizing prior for row intercepts
b0 ~ normal(0, sigma_b0); //Regularizing prior for species intercepts
L_diag ~ normal(0,0.1); //Prior for diagonal elements
Rho ~ lkj_corr_cholesky(eta); //Regularizing prior for Rho

for(i in 1:N){
Y[i,] ~ poisson(mu[i]);
FS[i,] ~ multi_normal_cholesky(FS_mu, Ld);
}
}
generated quantities{
//cov_matrix[P] cov_lambda;

//cov_lambda = lambda * lambda´
}
``````

I modified your code minimally to what I think is absolutely necessary to get rid of the BFMI
warning. Nevertheless I believe that your hyper-priors need refinement.

1 Like

Thank you very much!

You essentially moved the fixed parameters of factors in the tranformed data block to avoid sampling, isn’t it?

While reading your new version, I just realized that I let a strictly positive hyperprior for mu_lower, coming from another version in which I constrained loadings to be positive : a great amount of trouble could have come from this misspecification!

Yes. I changed this two:

Something to discuss:

`````` for(i in 1:(D-1)) { for(j in (i+1):(D)){ lambda[i,j] = 0; } } //0 on upper diagonal
for(i in 1:D) lambda[i,i] = L_diag[i]; //Positive values on diagonal
for(j in 1:D) {
for(i in (j+1):P) {
idx2 = idx2+1;
lambda[i,j] = L_lower[idx2];
}
}
``````

The above isn’t this just very similar to the Barlett Decomposition of the Wishart Distribution,
check matrix A with L is unit diagonal, thus redundant, or can be used as scale matrix.

If so, we do not need mu_lower and sigma_lower.

Last, I would change the sigma’s from gamma to cauchy:

Ok, the Barlett decomposition really looks like what is implemented in the jags code from boral (normal prior for the lower diag, zero on the upper). However, they do not sample the diagonal from a chi-squared distribution, but from a truncated normal…

Thank you very much for these insights, I would not have guess that by myself!

``````## JAGS model written for boral version 1.5 on 2018-04-20 09:43:59 ##

model {
## Data Level ##
for(i in 1:n) {
for(j in 1:p) { eta[i,j] <- inprod(lv.coefs[j,2:(num.lv+1)],lvs[i,]) + row.coefs.ID1[row.ids[i,1]] }
for(j in 1:p) { y[i,j] ~ dpois(exp(lv.coefs[j,1] + eta[i,j])) }

}

## Latent variables ##
for(i in 1:n) { for(k in 1:num.lv) { lvs[i,k] ~ dnorm(0,1) } }

## Process level and priors ##
for(j in 1:p) { lv.coefs[j,1] ~ dnorm(0,0.1) } ## Separate species intercepts

for(i in 1:n.ID[1]) { row.coefs.ID1[i] ~ dnorm(0, pow(row.sigma.ID1,-2)) }
row.sigma.ID1 ~ dunif(0,30)

for(i in 1:(num.lv-1)) { for(j in (i+2):(num.lv+1)) { lv.coefs[i,j] <- 0 } } ## Constraints to 0 on upper diagonal
for(i in 1:num.lv) { lv.coefs[i,i+1] ~ dnorm(0,0.1)I(0,) } ## Sign constraints on diagonal elements
for(i in 2:num.lv) { for(j in 2:i) { lv.coefs[i,j] ~ dnorm(0,0.1) } } ## Free lower diagonals
for(i in (num.lv+1):p) { for(j in 2:(num.lv+1)) { lv.coefs[i,j] ~ dnorm(0,0.1) } } ## All other elements

}``````

That’s ok in Stan too. You can sample from a truncated normal as well. It’s fine.
You may also consider to use a lkj correlation distribution instead of a Barlett Decomp. of Wishart.

`lv.coefs[i,i+1] ~ dnorm(0,0.1)I(0,)`

be aware that this would be `normal(0, sqrt(1/0.1));` in Stan, don’t forget to use constraints.
Also the gamma’s I suspect to be sqrt of inv_gamma in Stan.

Oh, I didn’t know about these differences between jags and stan…
I have litteraly implemented the Barlett decomposition in my code, and it works very well : no divergence and rhats and effective sample sizes looks greats for all loadings! The observed covariance matrix seems to be greatly approximated. I will make quickly a case study with simulated data!

``````data{
int N; // sample size
int P; // number of species
int D; // number of dimensions
int<lower=0> Y[N,P]; // data matrix of order [N,P]
}
transformed data{
int<lower=1> M;
vector[D] FS_mu; // factor means
vector<lower=0>[D] FS_sd; // factor standard deviations
for (m in 1:D) {
FS_mu[m] = 0; //Mean of factors = 0
FS_sd[m] = 1; //Sd of factors = 1
}
}
parameters{
//Parameters
vector[N] alpha; //Row intercepts
row_vector[P] b0; // Intercept per species
matrix[N,D] FS; //Factor scores, matrix of order [N,D]
cholesky_factor_corr[D] Rho; //Correlation matrix between factors
//Hyperparameters
real<lower=0> sigma_a; //Variance of the row intercepts
real<lower=0> sigma_b0; //Variance of the species intercepts
real<lower=0> eta; //Parameter of LKJ prior for Rho
}
transformed parameters{
cholesky_factor_cov[P,D] lambda; //Final matrix of laodings
matrix[D,D] Ld; // cholesky decomposition of the covariance matrix between factors
matrix[N,P] temp; //intermediate predictor
matrix<lower=0>[N,P] mu; // predicted values

// Correlation matrix of factors
Ld = diag_pre_multiply(FS_sd, Rho);

{
idx2 = 0;

for(i in 1:(D-1)) { for(j in (i+1):(D)){ lambda[i,j] = 0; } } //0 on upper diagonal
for(i in 1:D) lambda[i,i] = L_diag[i]; //Positive values on diagonal
for(j in 1:D) {
for(i in (j+1):P) {
idx2 = idx2+1;
lambda[i,j] = L_lower[idx2];
}
}
}

// Predictor
temp = FS * lambda';
for(n in 1:N) mu[n] = exp(alpha[n] + b0 + temp[n]);

}
model{
// Hyperpriors
sigma_a ~ gamma(2,0.1); //Row intercepts hyperprior
sigma_b0 ~ gamma(2,0.1); //Species intercept effects hyperprior
eta ~ gamma(2,0.1); //Parameter of the cholesky prior for FS correlation structure

// Priors
alpha ~ normal(0, sigma_a); //Regularizing prior for row intercepts
b0 ~ normal(0, sigma_b0); //Regularizing prior for species intercepts
for(d in 1:D) L_diag[d] ~ chi_square(M-d);
Rho ~ lkj_corr_cholesky(eta); //Regularizing prior for Rho

for(i in 1:N){
Y[i,] ~ poisson(mu[i]);
FS[i,] ~ multi_normal_cholesky(FS_mu, Ld);
}
}
generated quantities{
matrix[P,P] cov_lambda;

cov_lambda = multiply_lower_tri_self_transpose(lambda);
}
``````
1 Like

If `L` is square and lower diagonal, then as long as it has positive diagonals, `L * L'` will be positive definite. Otherwise, you can use a non-square `L` for lower ranks. I don’t know if there’s a way to force it to be semi-definite of unknown rank.

If someone wants to code up an efficient Wishart and inverse Wishart for Cholesky factors of covariance matrices, that’d be great. We have the correlation-based LKJ prior for Cholesky factors of correlation matrices, but that’s it for efficient priors.

Cool!

The efficient Wishart for Cholesky factors of covariance matrices is already in the Stan manual pg. 350.
The inverse W., is easy to calculate, to sample one has to use the transposed the Inverse.

Since A^-1 = (L^-1)^T L^-1 = U^-1 (U^-1)^T.