# Prior for 2 PL IRT Models

I am looking for advice on how to specify prior distributions in a two-parameter logistic item response theory model with predictor variables. In setting up my models, I follow @paul.buerkner’s Bayesian Item Response Modeling in R with brms and Stan and constrain the respondent latent variable (“ability”) \theta_j to follow a standard normal distribution.

Once I add respondent-level predictor variables, this strategy now constrains the residual standard deviation of the respondent latent variable to be 1. Instead, I would like the total variance of the respondent component to be 1 so that the item characteristics (\alpha_i and \beta_i) are comparable in magnitude across the models with and without predictors.

So far, I have found a way to do this by z-standardizing \theta_j in the transformed parameters block.

data {
int<lower = 1> N; // # of responses
int<lower = 1> M; // # of predictors
int<lower = 1> I; // # of actions
int<lower = 1> J; // # of participants
array[N] int<lower = 1, upper = I> ii;
array[N] int<lower = 1, upper = J> jj;
array[N] int<lower = 0, upper = 1> y;
matrix[J, M] X; // matrix of (z-standardized) predictors
}
parameters {
vector[J] z_J;
matrix[2, I] z_I;
vector<lower = 0>[2] sigma_I;
cholesky_factor_corr[2] L_I;
real mu_alpha;
real mu_beta;
vector[M] b_x;
}
transformed parameters {
matrix[I, 2] r_I = transpose(diag_pre_multiply(sigma_I, L_I) * z_I);
vector<lower = 0>[I] alpha = exp(mu_alpha + r_I[, 1]);
vector[I] beta = mu_beta + r_I[, 2];
real sigma_J = 1.0;
vector[J] theta = X * b_x + sigma_J * z_J;
vector[J] theta_z = (theta - mean(theta))/sd(theta);
}
model {
z_J ~ std_normal();
to_vector(z_I) ~ std_normal();
to_vector(sigma_I) ~ student_t(3, 0, 2.5);
L_I ~ lkj_corr_cholesky(2);
mu_beta ~ student_t(3, 0, 2.5);
mu_alpha ~ std_normal();
to_vector(b_x) ~ student_t(3, 0, 2.5);
y ~ bernoulli_logit(alpha[ii] .* (theta_z[jj] + beta[ii]));
}
generated quantities {
corr_matrix[2] R_I = multiply_lower_tri_self_transpose(L_I);
}

1. Is there a more elegant way of doing this that allows me to directly assign priors to the regression coefficients so that \theta_j = \text{X} b + b_j \sim \mathcal{N}(0, 1)?[1]
2. Any ideas on how to implement a regularized horseshoe prior or similar regularizing prior for the regression coefficients?
3. Is there a way to implement this model in brms?

Here is an almost equivalent model specified in brms.

brm(
formula = bf(
y ~ exp(logalpha) * eta,
eta ~ 1 + x1 + ... + (1|i|ii) + (1|jj),
logalpha ~ 1 + (1|i|ii),
nl = TRUE
),
family = brmsfamily(family = "bernoulli", link = "logit"),
prior = c(
prior(student_t(3, 0, 2.5), class = b, nlpar = eta),
prior(lkj_corr_cholesky(2), class = L),
prior(constant(1), class = sd, group = jj, nlpar = eta),
prior(student_t(3, 0, 2.5), class = sd, group = ii, nlpar = eta),
prior(normal(0, 1), class = b, nlpar = logalpha),
prior(student_t(3, 0, 2.5), class = sd, group = ii, nlpar = logalpha)
),
data = d_sim
)


1. In other words, \mathrm{Var}[\text{X} b] + \sigma_j^2 = 1 if I am not mistaken. ↩︎

This doesn’t necessarily answer your questions, but I would think to make the alpha/beta parameters comparable by using an IRT linking or equating method after estimation, as opposed to changing the model itself. This would also seem to make it easier to implement regularizing priors.

Hi @edm—thank you for your thoughts! That’s what I had been doing thus far. But how would that make it easier to implement regularizing priors? What kind of priors did you have in mind?

I might misunderstand what you are trying to do. But I was thinking you were trying to specify priors on parameters that would then be involved in a standardization. That could make it difficult to meaningfully specify the priors before standardization (at least, difficult if you were using semi-informative priors). If you avoided the standardization during model estimation and instead dealt with linking/equating later, then I thought the prior specification could become easier. A couple other thoughts:

• Maybe you have seen it, but those horseshoe priors appear to already exist in brms, see https://paul-buerkner.github.io/brms/reference/horseshoe.html

• Constraining the total variance reminded me of the SEM way of handling these models, where they distinguish between the “delta” and “theta” parameterization. The delta is basically “constrain total variance to 1” and theta is “constrain residual variance to 1”. In Bayesian SEM, I think people almost always rely on theta parameterization because it is tricky to constrain functions of parameters in a Bayesian context. This would appear to mean that no easy solution exists.

1 Like