# Regularization Factor Models

Hi,

I am trying to fit a factor model with a sparse non-negative loading matrix. To model the sparsity, I am using a mixture of two gamma distributions to enforce the non-negativity.

data {
int<lower=0> N; //number of students
int<lower=0> NY; //number of items
int<lower=0> N_eta; //number of skills
real a; //set to 10 for the spike variable

array[N, NY] int<lower = 0, upper = 1> Y; //dichotomously scored items in wide format
}

parameters {
//Q-matrix
matrix<lower = 0, upper = 1>[NY, N_eta] theta; //r_k
matrix<lower = 0>[NY, N_eta] Z_spike;
matrix<lower = 0>[NY, N_eta] Z_slab;

//Concept Knowledge Map i.e. Factors
matrix[N, N_eta] Z_eta; //MVN(0, I)
}

transformed parameters{

matrix<lower = 0>[NY, N_eta] W;
{
vector[NY * N_eta] vec_theta = to_vector(theta);
vector[NY * N_eta] vec_slab = to_vector(Z_slab);
vector[NY * N_eta] vec_spike = to_vector(Z_spike);

vector[NY * N_eta] W_vec = vec_theta .* vec_slab + (1 - vec_theta) .* vec_spike;

W = to_matrix(W_vec, NY, N_eta);
}
}

model {
matrix[N, NY] mu = Z_eta*W';

to_vector(theta) ~ beta(1, 1.5); //uniform(0, 1);
to_vector(Z_slab) ~ gamma(2,2); //gamma(5, 6); Trying gamma(2, 2) also
to_vector(Z_spike) ~ gamma(a, 1/a);

for(k in 1:N_eta){
Z_eta[, k] ~ std_normal();
}

target += bernoulli_logit_lupmf(to_array_1d(Y) | to_vector(mu));

}


However, my estimates of the loadings from W are inaccurate. Any recommendations on how to model the sparse loading would be greatly appreciated.

Thank you!

I have included my data generation code

factor_reg <- function(N_Student, N_Item, N_Skill, seed = 1234,
perc_miss){

set.seed(seed)
Q <- Matrix::rsparsematrix(nrow = N_Item, ncol = N_Skill, density = 1 - perc_miss,
rand.x = \(n) runif(n, min = 0.1, max = 1.5)) |>
as.matrix()

student_prof <- mvtnorm::rmvnorm(n = N_Student, mean = rep(0, N_Skill), sigma = diag(1, N_Skill))

nu <- tcrossprod(x = student_prof, y = Q)
Y <- matrix(999, nrow = N_Student,ncol = N_Item)
for(j in 1:N_Item){
Y[, j] <- rbinom(n = N_Student, size = 1, prob = plogis(nu[, j]))
}

list(Y = Y, Q_matrix = Q, N_eta = N_Skill, NY = N_Item, N = N_Student)

}


I haven’t tried to run the model, but it looks like you are putting a spike and slab on every entry of W and hoping that enough entries stay at 0 to keep the model parameters identified. I think this will cause problems where different skills switch between different columns of W across iterations of the MCMC, and there could also be rotational indeterminacy problems.

If I was coding this, I would get the model working with traditional priors (not spike/slab) on entries of W, then move to spikes and slabs after that. To identify the loadings, it is traditional to take the top N_eta \times N_eta square of W, the fix the entries in the upper triangle to 0. You might not want to do that, but I am guessing that this is why you have problems with the estimates of W. The papers below describe some related issues and propose some new solutions.

Thank you very much for the resources. I will go through them and make the necessary changes to the model.