I’m 99% sure that the following code is not going to work(it does at least compile), but I’m trying to wrap my head around how to modify the model to do what I’m looking for. The primary inspiration for imputing the missing values comes from this post on the discussion board. The meat of that solution to marginalizing out the missing binary outcome variable was including a loop over the missing values like this:
for(n in 1:N_missing) {
target += log_mix(Ymi[n], bernoulli_logit_lpmf(1 | mu),
bernoulli_logit_lpmf(0 | mu));
}
Where Ymi
is a vector of the probabilities that the missing data are 1 and (in my case) mu
is the result of the item response model (i.e., beta + exp(logalpha) * theta
). My hope is that I can use the IRT parameters estimated from those with observed data can be used to estimate the probability that a missing response is 0/1. I think this is helped by the fact that no one is missing responses to all the items, so there is still an opportunity to learn something about the latent trait from the partial item responses and then use that information to inform estimation of the missing responses. I understand that the mirt
package is able to generate estimates for IRT models with missing data using full-information maximum likelihood, so I know there’s a way of approaching this problem – just not sure how to implement a Bayesian and Stan-friendly version of it.
Additionally, instead of trying to build in a mixture model to reflect my beliefs about different causes of missingness, I wanted to try to build in the prediction of the Ymi
proportion via a beta regression. I’ve never written a beta regression in Stan, so I’m not anywhere close to confident in my implementation here. Additionally, I don’t know that a beta regression is going to really improve any estimation, but my desire is to test whether certain covariates (e.g., depression) might be related to the missingness and inform performance estimation. That’s the idea there, at least.
// generated with brms 2.15.0
functions {
/* compute correlated group-level effects
* Args:
* z: matrix of unscaled group-level effects
* SD: vector of standard deviation parameters
* L: cholesky factor correlation matrix
* Returns:
* matrix of scaled group-level effects
*/
matrix scale_r_cor(matrix z, vector SD, matrix L) {
// r is stored in another dimension order than z
return transpose(diag_pre_multiply(SD, L) * z);
}
}
data {
int<lower=1> N; // total number of observations
int Y[N]; // response variable
int<lower=0> Nmi; // number of missings
int<lower=1> K_beta; // number of population-level effects
matrix[N, K_beta] X_beta; // population-level design matrix
int<lower=1> K_logalpha; // number of population-level effects
matrix[N, K_logalpha] X_logalpha; // population-level design matrix
int<lower=1> K; // number of population-level effects for beta regression component
matrix[N, K] X; // population-level design matrix for beta regression component
// data for group-level effects of ID 1
int<lower=1> N_1; // number of grouping levels
int<lower=1> M_1; // number of coefficients per level
int<lower=1> J_1[N]; // grouping indicator per observation
// group-level predictor values
vector[N] Z_1_theta_1;
// data for group-level effects of ID 2
int<lower=1> N_2; // number of grouping levels
int<lower=1> M_2; // number of coefficients per level
int<lower=1> J_2[N]; // grouping indicator per observation
// group-level predictor values
vector[N] Z_2_beta_1;
vector[N] Z_2_logalpha_2;
int<lower=1> NC_2; // number of group-level correlations
int prior_only; // should the likelihood be ignored?
}
transformed data {
int Kc = K - 1;
matrix[N, Kc] Xc; // centered version of X without an intercept
vector[Kc] means_X; // column means of X before centering
for (i in 2:K) {
means_X[i - 1] = mean(X[, i]);
Xc[, i - 1] = X[, i] - means_X[i-1];
}
}
parameters {
vector<lower=0,upper=1>[Nmi] Ymi; // probability that missing response is 1
vector[K_beta] b_beta; // population-level effects
vector[K_logalpha] b_logalpha; // population-level effects
vector<lower=0>[M_1] sd_1; // group-level standard deviations
vector[N_1] z_1[M_1]; // standardized group-level effects
vector<lower=0>[M_2] sd_2; // group-level standard deviations
matrix[M_2, N_2] z_2; // standardized group-level effects
cholesky_factor_corr[M_2] L_2; // cholesky factor of correlation matrix
vector[Kc] b; // population-level effects for beta regression component
real Intercept; // temporary intercept for centered predictors of beta regression component
real<lower=0> phi; // precision parameter for beta regression component
}
transformed parameters {
vector[N_1] r_1_theta_1; // actual group-level effects
matrix[N_2, M_2] r_2; // actual group-level effects
// using vectors speeds up indexing in loops
vector[N_2] r_2_beta_1;
vector[N_2] r_2_logalpha_2;
r_1_theta_1 = (sd_1[1] * (z_1[1]));
// compute actual group-level effects
r_2 = scale_r_cor(z_2, sd_2, L_2);
r_2_beta_1 = r_2[, 1];
r_2_logalpha_2 = r_2[, 2];
}
model {
// likelihood including constants
if (!prior_only) {
// initialize linear predictor term
vector[N] nlp_theta = rep_vector(0.0, N);
// initialize linear predictor term
vector[N] nlp_beta = X_beta * b_beta;
// initialize linear predictor term
vector[N] nlp_logalpha = X_logalpha * b_logalpha;
// initialize non-linear predictor term
vector[N] mu;
// initialize linear predictor term
vector[Nmi] beta_mu = Intercept + Xc * b;
for (n in 1:N) {
// add more terms to the linear predictor
nlp_theta[n] += r_1_theta_1[J_1[n]] * Z_1_theta_1[n];
}
for (n in 1:N) {
// add more terms to the linear predictor
nlp_beta[n] += r_2_beta_1[J_2[n]] * Z_2_beta_1[n];
}
for (n in 1:N) {
// add more terms to the linear predictor
nlp_logalpha[n] += r_2_logalpha_2[J_2[n]] * Z_2_logalpha_2[n];
}
for (n in 1:N) {
// compute non-linear predictor values
mu[n] = nlp_beta[n] + exp(nlp_logalpha[n]) * nlp_theta[n];
}
for(n in 1:Nmi) {
// computing mixing ratios
beta_mu[n] = inv_logit(beta_mu[n]);
}
target += beta_lpdf(Ymi | beta_mu * phi, (1 - mu) *phi);
for(n in 1:Nmi) {
// handle missing outcomes
target += log_mix(Ymi[n], bernoulli_logit_lpmf(1 | mu),
bernoulli_logit_lpmf(0 | mu));
}
target += bernoulli_logit_lpmf(Y | mu);
}
// priors including constants
target += student_t_lpdf(sd_1 | 3, 0, 2.5)
- 1 * student_t_lccdf(0 | 3, 0, 2.5);
target += std_normal_lpdf(z_1[1]);
target += student_t_lpdf(sd_2 | 3, 0, 2.5)
- 2 * student_t_lccdf(0 | 3, 0, 2.5);
target += std_normal_lpdf(to_vector(z_2));
target += lkj_corr_cholesky_lpdf(L_2 | 1);
target += student_t_lpdf(Intercept | 3, 0, 2.5);
target += gamma_lpdf(phi | 0.01, 0.01);
}
generated quantities {
// actual population-level intercepts for beta regression component
real b_Intercept = Intercept - dot_product(means_X, b);
// compute group-level correlations
corr_matrix[M_2] Cor_2 = multiply_lower_tri_self_transpose(L_2);
vector<lower=-1,upper=1>[NC_2] cor_2;
// extract upper diagonal of correlation matrix
for (k in 1:M_2) {
for (j in 1:(k - 1)) {
cor_2[choose(k - 1, 2) + j] = Cor_2[j, k];
}
}
}
I want to make sure that conceptually what I’ve outlined above for handling the missing outcome variables is reasonable or sensible within Stan and using IRT data. I have a sense that what I want is possible, but I’m an applied researcher and the technical realities of identifiability and implementation are often hard for me to recognize until I’ve played with the models.
I’d also love to know what in my code is wrong. I tried to adapt everything to be in line with brms
language and standards, but some of the things that brms
does to improve efficiency are just well beyond my own Stan understanding. Ideally, I’d like to be able to run this with a QR decomposition as well, but my first priority is getting the general framework up and running. Personally, I think that being able to handle missing item responses in IRT applications would be a great option for those of us interested in Bayesian IRT with Stan, so I hope that a solution to this problem will be of use to people other than myself.