Dear Stan users,
I fitted a mixture Rasch model with two classes, where I use the latent Dirichlet allocation model. However, I am not able to recover the parameters from generated data.
Given j = 1, …, J students, i = 1, …, I items, k = 1, …, K latent classes, the probability of an examine j responding correctly to an item i is as follows:
p(y_{ijk} = 1 |θ_{kj},b_{ki},k_{j}) = \dfrac{e^{(θ_{kj}−b_{ki})}}{1+e^{(θ_{kj}−b_{ki})}}
I added non-centered parameterizations for the theta parameters to my model. So this is my model in Stan:
data{
int<lower=1> J; // number of students
int<lower=1> I; // number of items
int<lower=1> K; // number of groups
int<lower=1> N; // total number of observations (student x items)
int<lower=1, upper=I> ii[N]; // variable indexing the items
int<lower=1, upper=J> jj[N]; // variable indexing the students
int<lower=0, upper=1> y[N]; // binary variable outcome
}
parameters{
real<lower=0> sigma1; // theta1 variance hyperprior
real<lower=0> sigma2; // theta2 variance hyperprior
vector[I] beta1; // item difficulties (group 1)
vector[I] beta2; // item difficulties (group 2)
vector[J] theta1_norm; // person parameters (group 1)
vector[J] theta2_norm; // person parameters (group 2)
vector[K] mu; // mean theta of every group
simplex[K] phi; // mixing proportions
}
transformed parameters{
real<lower=0,upper=1> pCorr1[N]; // probability correct (group 1)
real<lower=0,upper=1> pCorr2[N]; // probability correct (group 2)
vector[J] theta1;
vector[J] theta2;
theta1 = mu[1] + theta1_norm * sigma1;
theta2 = mu[2] + theta2_norm * sigma2;
for(n in 1:N){
pCorr1[n] = 1/(1+exp(-theta1[jj[n]] + beta1[ii[n]]));
pCorr2[n] = 1/(1+exp(-theta2[jj[n]] + beta2[ii[n]]));
}
}
model{
// prior person parameter (theta)
sigma1 ~ exponential(.1);
sigma2 ~ exponential(.1);
theta1_norm ~ normal(0, 1);
theta2_norm ~ normal(0, 1);
mu[1] ~ std_normal(); // mu (group 1)
mu[2] ~ normal(1, 1); // mu (group 2); to avoid label switching problem
// prior item parameter (beta)
beta1 ~ normal(0, 1);
beta2 ~ normal(0, 1);
// prior mixture
phi ~ dirichlet([1, 1]');
// likelihood
for(n in 1:N) {
target += log_mix(phi[1],
bernoulli_lpmf(y[n] | pCorr1[n]),
bernoulli_lpmf(y[n] | pCorr2[n]));
}
}
To check the performance of my model, I generated some data in R:
library("reshape")
library("pcIRT")
n = 500
data1 <- simDRM(c(-0.4, 0.2, 0.2), persons = n)$datmat
data2 <- simDRM(c(-0.8, 0.4, 0.4), persons = n)$datmat
data <- as.data.frame(rbind(data1, data2))
names(data) <- c("y1", "y2", "y3")
data$X <- 1:nrow(data)
# Placing the data in long format and defining a numeric identifier for items
long <- reshape::melt(data, id.vars = "X", variable.name = "variable", value.name = "value")
key <- 1:length(unique(long$variable))
names(key) <- unique(long$variable)
long$item.id <- key[long$variable]
stan_data <- list(I = max(long$item.id),
J = max(long$X),
N = nrow(long),
K = 2,
ii = long$item.id,
jj = long$X,
y = long$value)
So, I can’t seem to recover the beta values (even with different prior specifications). Does anyone see if I am doing something wrong or how I could improve my code?
Hopefully, somebody can help me!
Thanks,
Jill