I am trying to fit a factor model with a sparse non-negative loading matrix. To model the sparsity, I am using a mixture of two gamma distributions to enforce the non-negativity.
data {
int<lower=0> N; //number of students
int<lower=0> NY; //number of items
int<lower=0> N_eta; //number of skills
real a; //set to 10 for the spike variable
array[N, NY] int<lower = 0, upper = 1> Y; //dichotomously scored items in wide format
}
parameters {
//Q-matrix
matrix<lower = 0, upper = 1>[NY, N_eta] theta; //r_k
matrix<lower = 0>[NY, N_eta] Z_spike;
matrix<lower = 0>[NY, N_eta] Z_slab;
//Concept Knowledge Map i.e. Factors
matrix[N, N_eta] Z_eta; //MVN(0, I)
}
transformed parameters{
matrix<lower = 0>[NY, N_eta] W;
{
vector[NY * N_eta] vec_theta = to_vector(theta);
vector[NY * N_eta] vec_slab = to_vector(Z_slab);
vector[NY * N_eta] vec_spike = to_vector(Z_spike);
vector[NY * N_eta] W_vec = vec_theta .* vec_slab + (1 - vec_theta) .* vec_spike;
W = to_matrix(W_vec, NY, N_eta);
}
}
model {
matrix[N, NY] mu = Z_eta*W';
to_vector(theta) ~ beta(1, 1.5); //uniform(0, 1);
to_vector(Z_slab) ~ gamma(2,2); //gamma(5, 6); Trying gamma(2, 2) also
to_vector(Z_spike) ~ gamma(a, 1/a);
for(k in 1:N_eta){
Z_eta[, k] ~ std_normal();
}
target += bernoulli_logit_lupmf(to_array_1d(Y) | to_vector(mu));
}
However, my estimates of the loadings from W are inaccurate. Any recommendations on how to model the sparse loading would be greatly appreciated.
I haven’t tried to run the model, but it looks like you are putting a spike and slab on every entry of W and hoping that enough entries stay at 0 to keep the model parameters identified. I think this will cause problems where different skills switch between different columns of W across iterations of the MCMC, and there could also be rotational indeterminacy problems.
If I was coding this, I would get the model working with traditional priors (not spike/slab) on entries of W, then move to spikes and slabs after that. To identify the loadings, it is traditional to take the top N_eta \times N_eta square of W, the fix the entries in the upper triangle to 0. You might not want to do that, but I am guessing that this is why you have problems with the estimates of W. The papers below describe some related issues and propose some new solutions.