Hello!
I tried last week to fit such a model (with some other parameters, like a hierarchical intercept per row), but keeping the usual constraints of factor analysis (0 on upper diag of loadings and positive diagonal of loading matrix). If I understand well, Bhattacharya & Dunson argue that identifiability is not necessary for most applications of factor models, like covariance estimation. Ohma, the problem with this approach is that we loose a lot of diagnostic power, which is important for very complicated/diffuse models like these ones.
Fitting the following model, I had a lot of divergent transitions and BFMI warnings. Moreover, the shrinkage were unsufficient : I simulated 3 non-zero loadings, but the model estimated at least 7 non-zero loading. However, the covariance were not too badly approximated.
We have to keep in mind a major difference between Gibbs sampler proposed by the authors and stan. The sampler described in the paper remove columns falling under a determined threshold, and reimplement them randomly to ensure their “zeroness”. This principle cannot be implemented in stan, and I guess we have to find an efficient trick to replace it.
I abandonned this approach and started wondering about implementing horseshoe-like priors for the loadings, but I don’t know what is the better way to increase the shrinkage intensity with column indices. I thought about sampling the parameter of the local shrinkage cauchy distribution from another cauchy distribution with mu = 1/d, d being the column index, but I doubt such a rude method could work. @avehtari could certainly have some ideas!
data{
int N; // sample size
int S; // number of species
int D; // number of factors
int<lower=0> Y[N,S]; // data matrix of order [N,S]
}
transformed data{
int<lower=1> M;
vector[D] FS_mu; // factor means
vector<lower=0>[D] FS_sd; // factor standard deviations
M = D*(S-D)+ D*(D-1)/2; // number of lower-triangular, non-zero loadings
for (m in 1:D) {
FS_mu[m] = 0; //Mean of factors = 0
FS_sd[m] = 1; //Sd of factors = 1
}
}
parameters{
//Parameters
real alpha; //Global intercept
vector[N] d0_raw; //Uncentered row intercepts
vector[M] L_lower; //Centered lower diagonal loadings
vector<lower=0>[D] L_diag; //Centered positive diagonal elements of loadings
matrix[N,D] FS; //Factor scores, matrix of order [N,D]
cholesky_factor_corr[D] Rho; //Correlation matrix between factors
real<lower=2> a1; //Hyperprior for first delta
real<lower=3> a2; //Hyperprior for other deltas
//Hyperparameters
real<lower=0, upper=pi()/2> sigma_d_unif; //Uncentered sd of the row intercepts
//Shrinkage parameters
vector<lower=0>[M+D] phi_jh; //Local shrinkage parameter
vector<lower=0>[D] delta_j; //Elements of the factor specific shrinkage priors
}
transformed parameters{
// Final parameters
vector[N] d0; //Final row intercepts
cholesky_factor_cov[S,D] L; //Final matrix of laodings
matrix[D,D] Ld; // cholesky decomposition of the covariance matrix between factors
// Final hyperparameters
real sigma_d; //Final sd of the row intercepts
// Final shrinkage priors
vector<lower=0>[D] tau_j; //Final factor specific shrinkage prior
//Predictors
matrix[N,S] Ups; //intermediate predictor
matrix<lower=0>[N,S] Mu; //predictor
// Compute the final hyperparameters
sigma_d = 0 + 1 * tan(sigma_d_unif); //sigma_d ~ Halfcauchy(0,2.5)
//Compute the final parameters
d0 = 0 + sigma_d * d0_raw; //do ~ Normal(0, sigma_d)
//Compute final global shrinkage prior
for(d in 1:D) tau_j[d] = prod(delta_j[1:d]); //Compute the product of gamma distributed deltas
// Correlation matrix of factors
Ld = diag_pre_multiply(FS_sd, Rho); //Fs_sd fixed to 1, Rho estimated
{
int idx2; //Index for the lower diagonal loadings
idx2 = 0;
// Constraints to allow identifiability of loadings
for(i in 1:(D-1)) { for(j in (i+1):(D)){ L[i,j] = 0; } } //0 on upper diagonal
for(i in 1:D) L[i,i] = L_diag[i]; //Positive values on diagonal
for(j in 1:D) {
for(i in (j+1):S) {
idx2 = idx2+1;
L[i,j] = L_lower[idx2]; //Insert lower diagonal values in loadings matrix
}
}
}
// Predictors
Ups = FS * L';
for(n in 1:N) Mu[n] = exp(alpha + d0[n] + Ups[n]);
}
model{
// Uncentered hyperpriors : the sampling will be automaticaly done as if they were defined on an uniform distribution between 0 or -pi/2 and pi/2 (see constraints)
//Shrinkage priors
{
int idx3;
idx3 = 0;
for(d in 1:D){
// Shrinkage prior for lower diagonal loadings
for(m in (idx3+1):(d*(S-d)+ d*(d-1)/2)){
L_lower[m] ~ normal(0, (1/phi_jh[m])*(1/tau_j[d]));
idx3 = (d*(S-d)+ d*(d-1)/2);
}
// Shrinkage prior for diagonal loadings
target += (D - d) * log(L_diag[d]) - .5 * L_diag[d] ^ 2 /
( (1/phi_jh[M+d]) * (1/tau_j[d]) );
}
}
//Shrinkage hyperpriors
phi_jh ~ gamma(3/2, 3/2);
delta_j[1] ~ gamma(3,1);
delta_j[2:D] ~ gamma(4,1);
a1 ~ gamma(2,1);
a2 ~ gamma(2,1);
// Priors
alpha ~ student_t(3,0,5); //Weakly informative prior for global intercept
d0_raw ~ normal(0,1); //Uncentered regularizing prior for row deviations
Rho ~ lkj_corr_cholesky(1); //Uninformative prior for Rho
//Compute the likelihood
for(i in 1:N){
Y[i,] ~ poisson(Mu[i,]);
FS[i,] ~ multi_normal_cholesky(FS_mu, Ld);
}
}
generated quantities{
matrix[S,S] cov_L;
matrix[S,S] cor_L;
matrix[N,S] Y_pred;
matrix[N,S] log_lik1;
vector[N*S] log_lik;
cov_L = multiply_lower_tri_self_transpose(L); //Create the covariance matrix
// Compute the correlation matrix from de covariance matrix
for(i in 1:S){
for(j in 1:S){
cor_L[i,j] = cov_L[i,j]/sqrt(cov_L[i,i]*cov_L[j,j]);
}
}
//Compute the likelihood and predictions for each observation
for(n in 1:N){
for(s in 1:S){
log_lik1[n,s] = poisson_lpmf(Y[n,s] | Mu[n,s]);
Y_pred[n,s] = poisson_rng(Mu[n,s]);
}
}
log_lik = to_vector(log_lik1); //Tranform in a vector usable by loo package
}
The data generating process and a detailed explanation of a similar code can be found here. The code is a little bit complicated because lower diag loadings are stored in a vector, and we have to select the ones corresponding to each columns (loop trick in the model part). For loadings, I tried to use the prior proposed by Leung & Drton (2014). This prior should “maintain invariance property under reordering variables”
I would be happy to continue the discussion, especially if we are able to find a reliable solution!
Lucas