Dear Jill,
For debugging purposes I would suggest you reduce the amount of data you simulate and also possibly run it on a single chain. This makes it way easier to determine errors in the model code.
The most obvious issue with your multi-group model specification is the indexing of beta_raw
and beta
. If you swap the indexing from vector[K] beta[I]
to vector[I] beta[K]
you will define the intended array of length K
containing a vector of length I
. These group-specific item difficulty parameter vectors can then be accessed by beta[k]
.
Fitting your model with this change will run, however there are still divergent transitions.
In your model code you also write
mu[1] ~ normal(0, 1); // mu group 1
mu[2] ~ normal(1, 1); // mu group 2; to avoid label switching problem
I think this non-exchangeable prior on mu
is not required here. Label switching generally refers to an issue in mixture models, which is not the case for your model as the grouping variable kk
is observed. In my opinion just specifying mu ~ normal(0, 1)
should be fine.
For IRT models I have made good experiences using the following optimizations:
- non-centered parameterizations for the
theta
parameters - dropping the sum-to-zero constraint of the
beta
parameters and instead using the distributional assumptionbeta ~ normal(0, 1)
. The issue with sum-to-zero constraints is that a prior distribution on the unconstrained item parameters will result in a non-exchangeable prior distribution for all item parameters. There is a thread on discourse that goes into great detail about soft vs. hard constraints, how they relate to sampling efficiency and how they interact with prior distributions.
Implementing the changes described above results in this model:
data {
int<lower=1> I; // number of items
int<lower=1> J; // number of students
int<lower=1> N; // total number of observations (student x items)
int<lower=1> K; // number of groups
int<lower=1, upper=I> ii[N]; // variable indexing the items
int<lower=1, upper=J> jj[N]; // variable indexing the students
int<lower=0, upper=1> y[N]; // binary variable outcome
int<lower=0, upper=K> kk[J]; // variable indexing the group membership
}
parameters {
vector[I] beta[K]; // difficulty of item
vector[J] theta_norm; // student ability
vector[K] mu; // mean theta of every group
real<lower=0> sigma; // hyperprior theta
}
transformed parameters {
vector[J] theta;
theta = mu[kk] + theta_norm * sigma;
}
model {
for (k in 1:K) {
beta[k] ~ normal(0, 1); // prior on the beta
}
mu ~ std_normal();
theta_norm ~ std_normal();
sigma ~ exponential(.1); // hyperprior theta
for (n in 1:N) {
y[n] ~ bernoulli_logit(theta[jj[n]] - beta[kk[jj[n]], ii[n]]);
}
}
Please note that there is one change in the data. kk
is now a vector of length J
. For the simulated data i just took the appropriate subset of your long
data, long[seq_len(2 * n), "group"]
.
library(pcIRT)
n <- 4000
sim_data <- simDRM(c(-0.4, 0.2, 0.2), persons = n)
data <- sim_data$datmat
sim_data2 <- simDRM(c(-0.8, 0.4, 0.4), persons = n)
data2 <- sim_data2$datmat
data_new <- rbind(data, data2)
data_new <- as.data.frame(data_new)
names(data_new) <- c("y1", "y2", "y3")
data_new$X <- 1:nrow(data_new)
# Placing the data in long format and defining a numeric identifier for items
library(reshape)
long <- reshape::melt(data_new, id.vars = "X", variable.name = "variable", value.name = "value")
long$group <- rep(1:2, each = n, times = 3)
key <- 1:length(unique(long$variable))
names(key) <- unique(long$variable)
long$item.id <- key[long$variable]
stan_data <- list(I = max(long$item.id),
J = max(long$X),
N = nrow(long),
K = 2,
ii = long$item.id,
jj = long$X,
kk = long[seq_len(2 * n), "group"],
y = long$value)
I did not do much validation of the code, but the beta
parameters seem to be recovered just fine.
I hope this helps. Best regards,
Philipp