Hi! I am a baby Stan modeler and I have been trying to fit a polytomous IRT model (i.e., Nominal response model) using simulated response data.
The nominal response model (NRM; Bock, 1972) models the probability of an examinee j with a latent ability trait \theta selecting a category of k of an item i as
P(u_{ij}=k\theta _j) = \frac{\exp(a_{ik}\theta_j +c_{ik})} {\sum_{k=1}^{m_i}\exp({a_{ik}\theta_j}+c_{ik})}.
In NRM each category of an item i has its slope parameter a_{ik} and intercept parameter c_{ik} and each item has as many slope and intercept parameters as the number of categories. For example, if an item has 4 categories, then the item i will have a vector of 4 slope parameters and 4 intercept parameters.
So I implemented the NRM model in Stan with the simulated response data (which is attached) Fatigue_NRM_sim_resp_20.csv (39.2 KB) as the following:
resp < resp+1
N < nrow(resp)
T < ncol(resp)
data_nrm<list(n_student = N,n_item=T,response=resp,K=5)
nrm < "
data{
int<upper=5> K; // number of categories
int <lower=0> n_student; // number of individuals
int <lower=0> n_item; // number of items
int<lower=1,upper=K> response[n_student,n_item]; //array of responses
}
parameters {
vector[K] zeta[n_item]; // intercept
vector[K] lambda[n_item]; // slope
vector[n_student] theta; // latent trait
}
transformed parameters {
vector[K] zetan[n_item]; // centered intercept
vector[K] lambdan[n_item]; // centered slope
for (k in 1:n_item) {
for (l in 1:K) {
zetan[k,l]<zeta[k,l]mean(zeta[k]);
lambdan[k,l]<lambda[k,l]mean(lambda[k]);
}}
}
model{
theta ~ normal(0,1);
for (i in 1: n_item){
zeta[i] ~ normal(0,4);
lambda[i] ~ normal(0,4);
}
for (i in 1:n_student){
for (j in 1:n_item){
response[i,j] ~ categorical_logit(zetan[j]+lambdan[j]*theta[i]);
}
}
}
"
My questions are on the convergence because the estimation reached convergence only very occasionally. But once the chains are converged, all the parameters seem to be recovered.

I tried to increase the number of iterations to reach convergence, but simply increasing the number of iterations did not really help attain convergence. Why would increasing the number of iterations in this case not help convergence?

I tried priors with smaller variance such as Zeta[i] ~ normal (0,1.5) and lambda[i] ~ normal(0, 1.5). It seems that it helped convergence at first, but multiple attempts with exactly the same code and priors did not always lead to convergence. Why would convergence be attained at one time but not another with the exact same code?

When the chains did not converge, the problematic parameters were slope and theta parameters. Intercept parameters were almost always converged. I checked the posterior densities for slope and theta parameters and found out that many of the densities were bimodal, which made me believe that the bimodal posterior densities were the reason for nonconvergence. What should I possibly fix to correct this biomodal densities?
Thank you so much!
Sincerely,
Sue