I have been running into some trouble getting sparsely loaded factor models to work.
My model features an ordinal likelihood, but I think the problem would persist in the continuous setting:
functions {
/* sratio-probit log-PDF for a single response
* Args:
* y: response category
* mu: linear predictor
* thres: ordinal thresholds
* disc: discrimination parameter
* Returns:
* a scalar to be added to the log posterior
*/
real sratio_probit_lpmf(int y, real mu, vector thres, real disc) {
int ncat = num_elements(thres) + 1;
vector[ncat] p;
vector[ncat - 1] q;
for (k in 1:(ncat - 1)) {
q[k] = 1 - Phi(disc * (thres[k] - mu));
p[k] = 1 - q[k];
for (kk in 1:(k - 1)) p[k] = p[k] * q[kk];
}
p[ncat] = prod(q);
return categorical_lpmf(y | p);
}
}
data {
int<lower=1> D;
int<lower=1> N;
int<lower=1> S;
int<lower=1> K;
int testmin;
int testmax;
int<lower=1,upper=D> group[N];
int<lower=testmin, upper=testmax> y[S,N];
real<lower=1> p0; // Expected number of large slopes
}
transformed data {
real slab_scale = 1; // Scale for large slopes
real slab_scale2 = square(slab_scale);
real slab_df = 25; // Effective degrees of freedom for large slopes
real half_slab_df = 0.5 * slab_df;
real tau0 = (p0 / (K - p0)) * (1. / sqrt(1.0 * S));
}
parameters {
matrix<lower=0>[K,N] W_tilde;
vector[testmax-2] gamma_tilde[D];
unit_vector[S] B_tilde[K];
//horseshoe
matrix<lower=0>[K,N] lambda;
real<lower=0> tau_tilde;
real<lower=0> c2_tilde;
}
transformed parameters {
matrix[K,N] W;
matrix[S,N] F;
real llk = 0;
matrix<lower=0,upper=1>[K,N] shrinkage;
{
matrix[S,K] B;
vector[testmax-1] gamma[D];
matrix[K,N] lambda_tilde;
//horseshoe
real tau = tau0 * tau_tilde;
real c2 = slab_scale2 * c2_tilde;
lambda_tilde = sqrt( c2 * square(lambda) ./ (c2 + square(tau) * square(lambda)) );
shrinkage = (tau/sqrt(c2))*lambda_tilde;
//factors
for (k in 1:K) {
B[,k] = B_tilde[k];
}
//ordinal likelihood weights
for (d in 1:D) {
gamma[d,1] = 0.;
gamma[d,2:] = gamma_tilde[d];
}
W = tau * lambda_tilde .* W_tilde;
F = (B*W);
for (n in 1:N) {
for (s in 1:S) {
llk += sratio_probit_lpmf(y[s,n] | F[s,n], gamma[group[n]], disc);
}
}
}
}
model {
//basis vectors
for (k in 1:K) {
B_tilde[k] ~ normal(0.,1.);
}
//loadings
to_vector(W_tilde) ~ normal(0.,1.);
//sparsity
to_vector(lambda) ~ cauchy(0., 1.);
tau_tilde ~ cauchy(0., 1.);
c2_tilde ~ inv_gamma(half_slab_df, half_slab_df);
//
for (d in 1:D)
gamma_tilde[d] ~ normal(0.,1.);
target += llk;
}
The key quantities are the unit-vector factors B
and the “finnish horseshoe” sparse loadings W
. I have followed Betancourt’s horseshoe implementation, adapted to factor models.
My problem seems to be one of non-identifiability, as the traceplots of the local sparsity scales lambda
features spiking behaviour where each random variable is predominantly zero, but with several tall spikes, so that none of the variables remain uniformly sparse or unshrunk throughout the trace. Furthermore, the (non-sparse) factor elements are all almost symmetrically centered around zero, indicating that the factors are either irrelevant or unidentified. One hypothesis is that there might be something at play similar to the butterfly geometry of http://mc-stan.org/users/documentation/case-studies/identifying_mixture_models.html, where the highly shrunk weights makes it possible for factor permutations to occur.