Please share your Stan program and accompanying data if possible.
I am fitting a multilevel 2PL IRT model to some simulated data with a multilevel 2PL model with rstan. I have tweaked the parameters for a week and find that I cannot really understand what’s going on with rstan and the results.
My data looks like this: 20_10_0.3_TwoPL_x1.txt (797.1 KB)
my rstan program looks like this:
data {
int<lower=1> N; // number of students
int<lower=1> NFam; // number of item families
int<lower=1> NObs; // number of observations
int<lower=1,upper=N> ID_Stu[NObs]; // student id
int<lower=1,upper=NFam> ID_Fam[NObs]; // item family id
int<lower=0,upper=1> Resp[NObs];// resp
//real x[NObs]; // design matrix
int NIC; //number of item clones, fixed in this syntax
int<lower=1,upper=NIC> ID_Clone[NObs];//item clone id
}
parameters {
real theta[N]; // ability
real mu[NFam]; // difficulty
real<lower=0, upper=5> alpha[NFam]; // discrimination
//real lambda[I]; // covariate
real<lower=0> sigma[NFam]; // random variation
vector[NIC] beta[NFam]; // item clone difficulties
}
model {
row_vector[NObs] alpha_local; // temporary parameters
row_vector[NObs] diff_local;
int fam_n;
theta~normal(0,1); // priors
mu~normal(0,10);
//lambda ~normal(0,10);
sigma~cauchy(0,.3);
for(i in 1:NFam){
beta[i] ~normal(mu[i], sigma[i]);
}
for (n in 1:NObs) {
fam_n <- ID_Fam[n];
alpha_local[n] <- alpha[fam_n];
diff_local[n] <- theta[ID_Stu[n]] - beta[fam_n, ID_Clone[n]];
}
Resp~bernoulli_logit(alpha_local .* diff_local);
}
The rstan model fitting looks like this:
data <- list(
N = N, #number of students
NFam = NFam, #number of item families
NObs = N * NFam * NItemClone, #number of observations
Resp = as.vector(t(irv)),#person by person response vector
ID_Stu = seq(0,length(Resp)-1)%/% (NFam*NItemClone) + 1, #student ID
ID_Fam = rep(seq(0,(NFam*NItemClone-1)) %/% NItemClone + 1,N), #item family ID
NIC = NItemClone, #number of item clones
ID_Clone = rep(rep(1:NItemClone),length(Resp)/NItemClone) #item clone ID
)
fit1 <- stan(
file = paste0(path,“TwoPL.stan”), # Stan program
data = data, # named list of data
chains = 2, # number of Markov chains
warmup = 200, # number of warmup iterations per chain
iter = 1200, # total number of iterations per chain
thin = 10,
cores = 4, # number of cores (using 2 just for the vignette)
refresh = 1, # show progress every ‘refresh’ iterations
control = list(max_treedepth = 20,adapt_delta=0.90)
)
sometimes I have divergent chains, so the warning message recommends increasing acceptance rate, which is already 0.90. I thought if the acceptance rate is too high, that would introduce more noises to the estimates because we are pretty much accepting whatever in the posterior. If not divergence problem, I may have chains with really bad mixing, such as the followings: 10_10_0.3_TwoPL_Mu_Traceplot_1.pdf (18.0 KB), like mu[2] and mu[7], they band is really wide, while other parameter estimates look ok. Why would that happen?
For the AC plot, some looks really bad too. Not sure why because I picked every 10th value so I thought the AC should be low for all the estimates. 20_10_0.3_TwoPL_Sigma_ACplot_1.pdf (9.8 KB)
I also spent a couple days just to play with the iterations and number of chains. I did read in the literature that 10000 effective sample size is desirable for social science research, but my effective sample sizes for the 2 chains for each parameter is around 100 or something. When there are divergent chains, some may end up with 20 to 30 effective sample size, which also reflected on very high se of the estimate. This is one output I had. If my chain is 1000 steps long after burned in 200, and I thinned every 10th, with 2 chains, should I have about 200 sample sizes ideally if no AC is presented?
So I would appreciate if anybody would help me take a look at my code and give me some suggestions in terms of how to specify the number of chains, number of iterations for IRT models, step size(reflected in the max_treedepth), and acceptance rate? Would .95 be too high? How big and small can be max_treedepth be? My actual step is less than 10 (for example: 10_10_0.3_TwoPL_Diagnostics_1.txt (1.9 KB)
) so I specified 20. If I make stan to walk big steps, and accept the posterior with high rate like 0.95, would it mean it actually converges fast? But actually it does not. For the calibration I attached here, it took about 4 hours. :-( just learned a bit about Bayestian computation, so I have so many dump questions. Any feedback would be highly appreciated! I