Dear Stan users,
I would like to fit a multi-group Rasch model in stan. So, I already fitted a simple Rasch model with one group, using a sum-to-zero constraint on the beta parameters. This seems to work fine if I compare it to the output of other R-packages (such as psychomix).
data {
int<lower=1> I; // number of items
int<lower=1> J; // number of students
int<lower=1> N; // total number of observations (student x items)
int<lower=1, upper=I> ii[N]; // variable indexing the items
int<lower=1, upper=J> jj[N]; // variable indexing the students
int<lower=0, upper=1> y[N]; // binary variable outcome
}
parameters {
vector[I-1] beta_raw; // difficulty of item
vector[J] theta; // student ability
real<lower=0> sigma;
}
transformed parameters {
vector[I] beta = append_row(beta_raw, -sum(beta_raw)); // add sum-to-zero constrain
}
model {
beta_raw ~ normal(0, 3); // prior on the beta
sigma ~ exponential(.1) // hyperprior
theta ~ normal(0, sigma); // prior on the theta
y ~ bernoulli_logit(theta[jj] - beta[ii]);
}
I now want to extend my model into a multi-group form. I want the groups to be able to differ in mu theta and in beta parameters. And now the sum-to-zero constraint on the beta parameters should be per group. To see if I can recover the parameters I simulated data in R:
library(pcIRT)
sim_data <- simDRM(c(-0.4, 0.2, 0.2), persons = 10000)
data <- sim_data$datmat
sim_data2 <- simDRM(c(-0.8, 0.4, 0.4), persons = 10000)
data2 <- sim_data2$datmat
data_new <- rbind(data, data2)
data_new <- as.data.frame(data_new)
names(data_new) <- c("y1", "y2", "y3")
data_new$X <- 1:nrow(data_new)
# Placing the data in long format and defining a numeric identifier for items
library(reshape)
long <- reshape::melt(data_new, id.vars = "X", variable.name = "variable", value.name = "value")
long$group <- rep(1:2, each = 10000, times = 3)
key <- 1:length(unique(long$variable))
names(key) <- unique(long$variable)
long$item.id <- key[long$variable]
stan_data <- list(I = max(long$item.id),
J = max(long$X),
N = nrow(long),
K = 2,
ii = long$item.id,
jj = long$X,
kk = long$group,
y = long$value)
My stan model now looks like this:
data {
int<lower=1> I; // number of items
int<lower=1> J; // number of students
int<lower=1> N; // total number of observations (student x items)
int<lower=1> K; // number of groups
int<lower=1, upper=I> ii[N]; // variable indexing the items
int<lower=1, upper=J> jj[N]; // variable indexing the students
int<lower=0, upper=K> kk[N]; // variable indexing the group membership
int<lower=0, upper=1> y[N]; // binary variable outcome
}
parameters {
vector[K] beta_raw[I-1]; // difficulty of item
vector[J] theta; // student ability
vector[K] mu; // mean theta of every group
real<lower=0> sigma; // hyperprior theta
}
transformed parameters {
vector[K] beta[I];
for (k in 1:K) {
beta[k] = append_row(beta_raw[k], -sum(beta_raw[k]));
}
}
model {
for (k in 1:K) {
beta_raw[k] ~ normal(0, 3); // prior on the beta
}
mu[1] ~ normal(0, 1); // mu group 1
mu[2] ~ normal(1, 1); // mu group 2; to avoid label switching problem
sigma ~ exponential(.1); // hyperprior theta
for (n in 1:N) {
theta[jj[n]] ~ normal(mu[kk[n]], sigma); // prior on the theta
}
for (n in 1:N) {
y[n] ~ bernoulli_logit(theta[jj[n]] - beta[kk[n]][ii[n]]);
}
}
Running this program takes incredibly long (it seems to be stuck at 1 / 2000 [ 0%] (Warmup)) so I had to stop it. I tried going for a matrix where I now use vector[K] beta[I];, but then I can’t seem to get the sum-to-zero to constrain working. So, I am a bit stuck. Does anyone see what I am doing wrong?
Also, I know vectorization is preferred in Stan; but if I change the last two loops of my model block, I get an error (“Available argument signatures for operator-: Expression ill performed”)
theta[jj] ~ normal(mu[kk], sigma); // prior on the theta
y ~ bernoulli_logit(theta[jj] - beta[kk][ii]); // ERROR: Expression is ill performed
Hopefully, somebody can help me to fit my first multi-group Rasch model!
Thanks,
Jill