Regularization Factor Models

Hi,

I am trying to fit a factor model with a sparse non-negative loading matrix. To model the sparsity, I am using a mixture of two gamma distributions to enforce the non-negativity.

data {
  int<lower=0> N; //number of students
  int<lower=0> NY; //number of items
  int<lower=0> N_eta; //number of skills
  real a; //set to 10 for the spike variable
  
  array[N, NY] int<lower = 0, upper = 1> Y; //dichotomously scored items in wide format
}

parameters {
  //Q-matrix
  matrix<lower = 0, upper = 1>[NY, N_eta] theta; //r_k 
  matrix<lower = 0>[NY, N_eta] Z_spike; 
  matrix<lower = 0>[NY, N_eta] Z_slab;   
  
  //Concept Knowledge Map i.e. Factors
  matrix[N, N_eta] Z_eta; //MVN(0, I)
}

transformed parameters{
  
  matrix<lower = 0>[NY, N_eta] W;  
  {
    vector[NY * N_eta] vec_theta = to_vector(theta);
    vector[NY * N_eta] vec_slab = to_vector(Z_slab);
    vector[NY * N_eta] vec_spike = to_vector(Z_spike);   
    
    vector[NY * N_eta] W_vec = vec_theta .* vec_slab + (1 - vec_theta) .* vec_spike; 
    
    W = to_matrix(W_vec, NY, N_eta);
  }  
}


model {
  matrix[N, NY] mu = Z_eta*W';

  to_vector(theta) ~ beta(1, 1.5); //uniform(0, 1);
  to_vector(Z_slab) ~ gamma(2,2); //gamma(5, 6); Trying gamma(2, 2) also
  to_vector(Z_spike) ~ gamma(a, 1/a);
  
  for(k in 1:N_eta){
    Z_eta[, k] ~ std_normal(); 
  }
  
  target += bernoulli_logit_lupmf(to_array_1d(Y) | to_vector(mu));

}

However, my estimates of the loadings from W are inaccurate. Any recommendations on how to model the sparse loading would be greatly appreciated.

Thank you!

I have included my data generation code

factor_reg <- function(N_Student, N_Item, N_Skill, seed = 1234, 
                          perc_miss){
    
    set.seed(seed)
    Q <- Matrix::rsparsematrix(nrow = N_Item, ncol = N_Skill, density = 1 - perc_miss, 
                               rand.x = \(n) runif(n, min = 0.1, max = 1.5)) |> 
        as.matrix()
    
    
    student_prof <- mvtnorm::rmvnorm(n = N_Student, mean = rep(0, N_Skill), sigma = diag(1, N_Skill))
    
    nu <- tcrossprod(x = student_prof, y = Q)
    Y <- matrix(999, nrow = N_Student,ncol = N_Item)
    for(j in 1:N_Item){
        Y[, j] <- rbinom(n = N_Student, size = 1, prob = plogis(nu[, j]))
    }
    
    list(Y = Y, Q_matrix = Q, N_eta = N_Skill, NY = N_Item, N = N_Student)
    
}

I haven’t tried to run the model, but it looks like you are putting a spike and slab on every entry of W and hoping that enough entries stay at 0 to keep the model parameters identified. I think this will cause problems where different skills switch between different columns of W across iterations of the MCMC, and there could also be rotational indeterminacy problems.

If I was coding this, I would get the model working with traditional priors (not spike/slab) on entries of W, then move to spikes and slabs after that. To identify the loadings, it is traditional to take the top N_eta \times N_eta square of W, the fix the entries in the upper triangle to 0. You might not want to do that, but I am guessing that this is why you have problems with the estimates of W. The papers below describe some related issues and propose some new solutions.

Thank you very much for the resources. I will go through them and make the necessary changes to the model.