Shortly after posting the previous update, I started working on the problem from a different angle. In essence, the approach that I’m taking now is to compute the linear prediction component of the model and placing a prior on this constructed value. It’s long, but here’s the full Stan code that I’m currently working with:
functions {
// scale random variables that are correlated
matrix scale_rand(matrix std, vector sigma, matrix omega) {
return transpose(diag_pre_multiply(sigma, omega) * std);
}
} // functions block
data {
// long format item response theory inputs
int<lower=1> I; // number of items
int<lower=1> J; // number of persons
int<lower=1> N; // number of observations (I * J)
array[N] int<lower=1, upper=I> ii; // index of items
array[N] int<lower=1, upper=J> jj; // index of persons
array[N] int<lower=0, upper=1> y; // responses
// latent linear trait model inputs
int<lower=0> K_beta_b; // number of fixed predictors on difficulty
int<lower=0> K_beta_a; // number of fixed predictors on discrimination
matrix[I, K_beta_b] X_b; // covariate matrix for difficulty
matrix[I, K_beta_a] X_a; // covariate matrix for discrimination
// latent regression model inputs
int<lower=0> K_beta_j; // number of fixed predictors on ability
matrix[J, K_beta_j] X_j; // covariate matrix for ability
// random item model inputs
int<lower=0> K_rnd_b; // number of random covariates on difficulty
int<lower=0> K_rnd_a; // number of random covariates on discrimination
matrix[I, K_rnd_b] R_b; // covariate matrix for difficulty
matrix[I, K_rnd_a] R_a; // covariate matrix for difficulty
// random person inputs
int<lower=1> K_rnd_j; // number of random covariates on ability (must include intercept)
matrix[J, K_rnd_j] R_j; // covariate matrix for ability
// item-by-person covariates
int<lower=0> K_ibj_b; // number of item-by-person covariates on difficulty
int<lower=0> K_ibj_a; // number of item-by-person covariates on discrimination
matrix[N, K_ibj_b] X_b_j; // item-by-person covariate matrix for difficulty
matrix[N, K_ibj_a] X_a_j; // item-by-person covariate matrix for discrimination
// pervasive bias model inputs
int<lower=0> K_bias_j; // number of explanatory person-variables
int<lower=0> K_bias_b; // number of difficulty item covariates to evaluate
int<lower=0> K_bias_a; // number of discrimination item covariates to evaluate
array[K_bias_j] int bias_j; // column number(s) for person variable(s) (bias_j[1] *must* be the group variable)
array[K_bias_b] int bias_b; // column number(s) for difficulty variable(s)
array[K_bias_a] int bias_a; // column number(s) for discrimination variable(s)
// informative priors
matrix[4, I] prior_i; // item priors (rows = mu_beta, sd_beta, mu_logalpha, sd_logalpha; columns = items)
vector[2] prior_j; // person priors (mu_theta, sd_theta)
} // data block
transformed data {
// identifiers for random items effects by parameter
array[(K_rnd_b + (K_bias_b > 0))] int id_rnd_b;
array[(K_rnd_a + (K_bias_a > 0))] int id_rnd_a;
// identifiers for random person effects by parameter
array[K_rnd_j] int id_rnd_j;
array[K_bias_b] int id_bias_b;
array[K_bias_a] int id_bias_a;
// combine person random effects into single matrix
matrix[J, (K_rnd_j + K_bias_b + K_bias_a)] R_j_c;
// create random item effect identifiers
for ( k in 1 : (K_rnd_b + (K_bias_b > 0)) ) id_rnd_b[k] = k;
for ( k in 1 : (K_rnd_a + (K_bias_a > 0)) ) id_rnd_a[k] = k + (K_rnd_b + (K_bias_b > 0));
// create random person effect identifiers
for ( k in 1 : K_rnd_j ) id_rnd_j[k] = k;
for ( k in 1 : K_bias_b ) id_bias_b[k] = k + K_rnd_j;
for ( k in 1 : K_bias_a ) id_bias_a[k] = k + K_rnd_j + K_bias_b;
// create person random effects matrix
{
matrix[J, (K_rnd_j + K_bias_b)] R_j_temp = append_col(R_j, rep_matrix(rep_vector(1.0, J), K_bias_b));
R_j_c = append_col(R_j_temp, rep_matrix(rep_vector(1.0, J), K_bias_a));
} // local environment
// convenience count of random effects on items
int n_rnd_j = K_rnd_j + K_bias_b + K_bias_a;
// convenience count of random effects on items
int n_rnd_i = K_rnd_b + K_rnd_a + (K_bias_b > 0) + (K_bias_a > 0);
// index for all covariates examined outside of pervasive bias
array[(K_beta_b - K_bias_b)] int unbias_b;
array[(K_beta_a - K_bias_a)] int unbias_a;
{
int b_b = 1; // initialize indicator on difficulty included in bias
int b_a = 1; // initialize indicator on discrimination included in bias
int f_b = 1; // initialize indicator on difficulty excluded from bias
int f_a = 1; // initialize indicator on discrimination excluded from bias
if ( K_bias_b > 0 ) {
for ( k in 1 : K_beta_b ) {
if ( k == bias_b[b_b] ) {
b_b += 1; // increment indicator for next biased variable
} else {
unbias_b[f_b] = k;
f_b += 1; // increment indicator for next non-biased variable
} // if statement
} // for loop
} else {
for ( k in 1 : K_beta_b ) unbias_b[k] = k;
} // if statement
if ( K_bias_a > 0 ) {
for ( k in 1 : K_beta_a ) {
if ( k == bias_a[b_a] ) {
b_a += 1; // increment indicator for next biased variable
} else {
unbias_a[f_a] = k;
f_a += 1; // increment indicator for next non-biased variable
} // if statement
} // for loop
} else {
for ( k in 1 : K_beta_a ) unbias_a[k] = k;
} // if statement
} // local environment
} // transformed data block
parameters {
// latent linear trait parameters
vector[(K_beta_b - K_bias_b)] beta_b_raw; // coefficients for difficulty, non-centered parameterization
vector[(K_beta_a - K_bias_a)] beta_a_raw; // coefficients for discrimination, non-centered parameterization
// latent regression parameters
vector[K_beta_j] beta_j_raw; // coefficients for ability, non-centered parameterization
// random item parameters
cholesky_factor_corr[n_rnd_i] omega_i; // correlation of random item covariates
matrix[n_rnd_i, I] z_par_i; // standardized random item covariates
vector<lower=0>[n_rnd_i] s_par_i; // standard deviation of random item covariates
// random person parameters
cholesky_factor_corr[n_rnd_j] omega_j; // correlation of random person covariates
matrix[n_rnd_j, J] z_par_j; // standardized random person covariates
vector<lower=0>[n_rnd_j] s_par_j; // standard deviation of random person covariates
// item-by-person parameters
vector[K_ibj_b] beta_b_j_raw; // coefficients for difficulty-person, non-centered parameterization
vector[K_ibj_a] beta_a_j_raw; // coefficients for discrimination-person, non-centered parameterization
// pervasive bias parameters
array[K_bias_b] vector[K_bias_j] beta_bias_b_j_raw; // person coefficients for each of item covariate on difficulty subject to bias
array[K_bias_a] vector[K_bias_j] beta_bias_a_j_raw; // person coefficients for each of item covariate on discrimination subject to bias
} // parameters block
transformed parameters {
// IRT parameters
vector[N] beta;
vector[N] alpha;
vector[N] theta;
// extract random item parameters
matrix[I, (K_rnd_b + (K_bias_b > 0))] rand_b;
matrix[I, (K_rnd_a + (K_bias_a > 0))] rand_a;
{
matrix[I, n_rnd_i] int_i = scale_rand(z_par_i, s_par_i, omega_i);
rand_b = int_i[, id_rnd_b];
rand_b = int_i[, id_rnd_a];
} // local environment
// extract random person parameters
matrix[J, n_rnd_j] rand_j = scale_rand(z_par_j, s_par_j, omega_j);
vector[N] beta_raw = rep_vector(0.0, N);
vector[N] logalpha_raw = rep_vector(0.0, N);
vector[N] theta_raw = rep_vector(0.0, N);
beta_raw += X_b[ii, unbias_b] * beta_b_raw + X_b_j * beta_b_j_raw;
logalpha_raw += X_a[ii, unbias_a] * beta_a_raw + X_a_j * beta_a_j_raw;
for ( k in 1 : K_bias_b ) {
beta_raw += X_b[ii, bias_b[k]] .* (X_j[jj, bias_j] * beta_bias_b_j_raw[k, : ] +
R_j_c[jj, id_bias_b[k]] .* rand_j[jj, id_bias_b[k]]);
}
for ( k in 1 : K_bias_a ) {
logalpha_raw += X_a[ii, bias_a[k]] .* (X_j[jj, bias_j] * beta_bias_a_j_raw[k, : ] +
R_j_c[jj, id_bias_a[k]] .* rand_j[jj, id_bias_a[k]]);
}
theta_raw += X_j[jj, ] * beta_j_raw;
// rescale IRT parameters
for ( n in 1 : N ) {
beta[n] = beta_raw[n] * prior_i[2, ii[n]] + prior_i[1, ii[n]];
alpha[n] = exp(logalpha_raw[n] * prior_i[4, ii[n]] + prior_i[3, ii[n]]);
}
theta = theta_raw * prior_j[2] + prior_j[1];
} // transformed parameters block
model {
// non-centered IRT parameters
target += std_normal_lupdf(beta_raw);
target += std_normal_lupdf(logalpha_raw);
target += std_normal_lupdf(theta_raw);
// non-centered latent linear trait priors
beta_b_raw ~ student_t(3, 0, 1);
beta_a_raw ~ student_t(3, 0, 1);
// non-centered latent regression priors
beta_j_raw ~ student_t(3, 0, 1);
// non-centered random item priors
omega_i ~ lkj_corr_cholesky(1);
to_vector(z_par_i) ~ std_normal();
s_par_i ~ student_t(3, 0, 1);
// non-centered random person priors
omega_j ~ lkj_corr_cholesky(1);
to_vector(z_par_j) ~ std_normal();
s_par_j ~ student_t(3, 0, 1);
// non-centered item-by-person priors
beta_b_j_raw ~ student_t(3, 0, 1);
beta_a_j_raw ~ student_t(3, 0, 1);
// non-centered pervasive bias parameters
for ( k in 1 : K_bias_b ) beta_bias_b_j_raw[k, : ] ~ student_t(3, 0, 1);
for ( k in 1 : K_bias_a ) beta_bias_a_j_raw[k, : ] ~ student_t(3, 0, 1);
// likelihood
{
vector[N] eta = alpha .* theta + beta;
y ~ bernoulli_logit(eta);
}
} // model block
I haven’t robustly tested the model yet, but it seems to be generally doing what I want. The optimize and ADVI estimation methods seem to run reliably without issues. Sampling is, of course, much slower and thus harder to test. Running on a sample of I = 40 and N = 2075 (about the sample size I need it to work on), it took about 20 hours for the standard 1000/1000 warmup/iterations. The only overt warning/error from the model was that every post-warmup iteration hit the maximum tree depth, and some quick eyeballing of the outcome seems to suggest no deeply pathologic outcomes. I also only tested it in the case where K_bias_j
, K_bias_a
, and K_bias_b
were all zero, which is not the desired end use case (I need to add in the data block a way of passing informative priors for the coefficients of the item covariates that are decomposed by including these other variables before trying that addition out).
The model that I’m shooting for should be something like this (apologies if the notation isn’t exactly right here):
y_{ijg} \sim Bernoulli(logit^{-1}(\eta_{ijg}))
\eta_{ijg} = \alpha_{ig}(\theta_{jg} - \beta_{ig})
\alpha_{ijg} = \alpha_0 + \alpha_{0i} + \gamma_{i}^{\alpha}Z_g + \sum_{p=1}^{P}\sum_{k=1}^{K}(\delta_{jp}^{\alpha} + \omega_{gp}^{\alpha}Z_{g} + \omega_{jpk}^{\alpha}Z_j)X_{ip}
\beta_{ijg} = \beta_0 + \beta_{0i} + \gamma_{i}^{\beta}Z_g + \sum_{p=1}^{P}\sum_{k=1}^{K}(\delta_{jp}^{\beta} + \omega_{gp}^{\beta}Z_{g} + \omega_{jpk}^{\beta}Z_j)X_{ip}
\theta_{jg} = \theta_{0j} + \omega_{g}^{\theta}Z_{g} + \sum_{k=1}^{K}(\omega_{k}^{\theta}Z_{jk})
Where i indexes the item number, j the person, and g the group (a dichotomous variable in this case). Variables affecting the item parameters are represented via X while those for person covariates are shown as Z, and there are a total of P item covariates and K person covariates of interest.
I’d love to make the model as efficient as possible since there are some additional things that I’ll need to add to it (e.g,. I need a missing data mechanism for some partially observed person covariates). If it is relevant to those recommendations, I’m using a 2021 Mac Book Pro with an M1 Max processor (10 cores). I’d also greatly appreciate any suggestions for handling the missing data on the covariates – it’s mostly continuous variables that are missing but some are ordinal and then a few more are binary.
I’m also pretty far out from my comfort zone when it comes to writing models in Stan (or really in any kind of non-lme4 style notation for that matter). So, if someone spots an issue or mistake somewhere, then I’d greatly appreciate the sanity check!