Brms: mi() for discrete outcomes in an IRT model

I’m 99% sure that the following code is not going to work(it does at least compile), but I’m trying to wrap my head around how to modify the model to do what I’m looking for. The primary inspiration for imputing the missing values comes from this post on the discussion board. The meat of that solution to marginalizing out the missing binary outcome variable was including a loop over the missing values like this:

for(n in 1:N_missing) {
  target += log_mix(Ymi[n], bernoulli_logit_lpmf(1 | mu),
                            bernoulli_logit_lpmf(0 | mu));
}

Where Ymi is a vector of the probabilities that the missing data are 1 and (in my case) mu is the result of the item response model (i.e., beta + exp(logalpha) * theta). My hope is that I can use the IRT parameters estimated from those with observed data can be used to estimate the probability that a missing response is 0/1. I think this is helped by the fact that no one is missing responses to all the items, so there is still an opportunity to learn something about the latent trait from the partial item responses and then use that information to inform estimation of the missing responses. I understand that the mirt package is able to generate estimates for IRT models with missing data using full-information maximum likelihood, so I know there’s a way of approaching this problem – just not sure how to implement a Bayesian and Stan-friendly version of it.

Additionally, instead of trying to build in a mixture model to reflect my beliefs about different causes of missingness, I wanted to try to build in the prediction of the Ymi proportion via a beta regression. I’ve never written a beta regression in Stan, so I’m not anywhere close to confident in my implementation here. Additionally, I don’t know that a beta regression is going to really improve any estimation, but my desire is to test whether certain covariates (e.g., depression) might be related to the missingness and inform performance estimation. That’s the idea there, at least.

// generated with brms 2.15.0
functions {
  /* compute correlated group-level effects
  * Args: 
    *   z: matrix of unscaled group-level effects
  *   SD: vector of standard deviation parameters
  *   L: cholesky factor correlation matrix
  * Returns: 
    *   matrix of scaled group-level effects
  */ 
    matrix scale_r_cor(matrix z, vector SD, matrix L) {
      // r is stored in another dimension order than z
      return transpose(diag_pre_multiply(SD, L) * z);
    }
}

data {
  int<lower=1> N;  // total number of observations
  int Y[N];  // response variable
  int<lower=0> Nmi;  // number of missings
  int<lower=1> K_beta;  // number of population-level effects
  matrix[N, K_beta] X_beta;  // population-level design matrix
  int<lower=1> K_logalpha;  // number of population-level effects
  matrix[N, K_logalpha] X_logalpha;  // population-level design matrix
  int<lower=1> K;  // number of population-level effects for beta regression component
  matrix[N, K] X;  // population-level design matrix for beta regression component
  // data for group-level effects of ID 1
  int<lower=1> N_1;  // number of grouping levels
  int<lower=1> M_1;  // number of coefficients per level
  int<lower=1> J_1[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_1_theta_1;
  // data for group-level effects of ID 2
  int<lower=1> N_2;  // number of grouping levels
  int<lower=1> M_2;  // number of coefficients per level
  int<lower=1> J_2[N];  // grouping indicator per observation
  // group-level predictor values
  vector[N] Z_2_beta_1;
  vector[N] Z_2_logalpha_2;
  int<lower=1> NC_2;  // number of group-level correlations
  int prior_only;  // should the likelihood be ignored?
}

transformed data {
  int Kc = K - 1;
  matrix[N, Kc] Xc; // centered version of X without an intercept
  vector[Kc] means_X;  // column means of X before centering
  for (i in 2:K) {
    means_X[i - 1] = mean(X[, i]);
    Xc[, i - 1] = X[, i] - means_X[i-1];
  }
}

parameters {
  vector<lower=0,upper=1>[Nmi] Ymi; // probability that missing response is 1
  vector[K_beta] b_beta;  // population-level effects
  vector[K_logalpha] b_logalpha;  // population-level effects
  vector<lower=0>[M_1] sd_1;  // group-level standard deviations
  vector[N_1] z_1[M_1];  // standardized group-level effects
  vector<lower=0>[M_2] sd_2;  // group-level standard deviations
  matrix[M_2, N_2] z_2;  // standardized group-level effects
  cholesky_factor_corr[M_2] L_2;  // cholesky factor of correlation matrix
  vector[Kc] b;  // population-level effects for beta regression component
  real Intercept;  // temporary intercept for centered predictors of beta regression component
  real<lower=0> phi;  // precision parameter for beta regression component
}

transformed parameters {
  vector[N_1] r_1_theta_1;  // actual group-level effects
  matrix[N_2, M_2] r_2;  // actual group-level effects
  // using vectors speeds up indexing in loops
  vector[N_2] r_2_beta_1;
  vector[N_2] r_2_logalpha_2;
  r_1_theta_1 = (sd_1[1] * (z_1[1]));
  // compute actual group-level effects
  r_2 = scale_r_cor(z_2, sd_2, L_2);
  r_2_beta_1 = r_2[, 1];
  r_2_logalpha_2 = r_2[, 2];
}

model {
  // likelihood including constants
  if (!prior_only) {
    // initialize linear predictor term
    vector[N] nlp_theta = rep_vector(0.0, N);
    // initialize linear predictor term
    vector[N] nlp_beta = X_beta * b_beta;
    // initialize linear predictor term
    vector[N] nlp_logalpha = X_logalpha * b_logalpha;
    // initialize non-linear predictor term
    vector[N] mu;
    // initialize linear predictor term
    vector[Nmi] beta_mu = Intercept + Xc * b;
    for (n in 1:N) {
      // add more terms to the linear predictor
      nlp_theta[n] += r_1_theta_1[J_1[n]] * Z_1_theta_1[n];
    }
    for (n in 1:N) {
      // add more terms to the linear predictor
      nlp_beta[n] += r_2_beta_1[J_2[n]] * Z_2_beta_1[n];
    }
    for (n in 1:N) {
      // add more terms to the linear predictor
      nlp_logalpha[n] += r_2_logalpha_2[J_2[n]] * Z_2_logalpha_2[n];
    }
    for (n in 1:N) {
      // compute non-linear predictor values
      mu[n] = nlp_beta[n] + exp(nlp_logalpha[n]) * nlp_theta[n];
    }
    for(n in 1:Nmi) {
      // computing mixing ratios
      beta_mu[n] = inv_logit(beta_mu[n]);
    }
    target += beta_lpdf(Ymi | beta_mu * phi, (1 - mu) *phi);
    for(n in 1:Nmi) {
      // handle missing outcomes
      target += log_mix(Ymi[n], bernoulli_logit_lpmf(1 | mu),
                                bernoulli_logit_lpmf(0 | mu));
    }
    target += bernoulli_logit_lpmf(Y | mu);
  }
  // priors including constants
  target += student_t_lpdf(sd_1 | 3, 0, 2.5)
  - 1 * student_t_lccdf(0 | 3, 0, 2.5);
  target += std_normal_lpdf(z_1[1]);
  target += student_t_lpdf(sd_2 | 3, 0, 2.5)
  - 2 * student_t_lccdf(0 | 3, 0, 2.5);
  target += std_normal_lpdf(to_vector(z_2));
  target += lkj_corr_cholesky_lpdf(L_2 | 1);
  target += student_t_lpdf(Intercept | 3, 0, 2.5);
  target += gamma_lpdf(phi | 0.01, 0.01);
}

generated quantities {
  // actual population-level intercepts for beta regression component
  real b_Intercept = Intercept - dot_product(means_X, b);
  // compute group-level correlations
  corr_matrix[M_2] Cor_2 = multiply_lower_tri_self_transpose(L_2);
  vector<lower=-1,upper=1>[NC_2] cor_2;
  // extract upper diagonal of correlation matrix
  for (k in 1:M_2) {
    for (j in 1:(k - 1)) {
      cor_2[choose(k - 1, 2) + j] = Cor_2[j, k];
    }
  }
}

I want to make sure that conceptually what I’ve outlined above for handling the missing outcome variables is reasonable or sensible within Stan and using IRT data. I have a sense that what I want is possible, but I’m an applied researcher and the technical realities of identifiability and implementation are often hard for me to recognize until I’ve played with the models.

I’d also love to know what in my code is wrong. I tried to adapt everything to be in line with brms language and standards, but some of the things that brms does to improve efficiency are just well beyond my own Stan understanding. Ideally, I’d like to be able to run this with a QR decomposition as well, but my first priority is getting the general framework up and running. Personally, I think that being able to handle missing item responses in IRT applications would be a great option for those of us interested in Bayesian IRT with Stan, so I hope that a solution to this problem will be of use to people other than myself.

1 Like