We want to estimate the parameters of a zero-inflated mixture IRT model, similar to the zero-inflated Poisson model (https://mc-stan.org/docs/2_20/stan-users-guide/zero-inflated-section.html) but with multivariate ordinal data. There are 9 items, each with 4 ordered response categories. A large proportion of people in the sample endorse 0 for all 9 items. In our simulated data, 160 of N = 500 people have all-0 response patterns.
The IRT component of the mixture model predicts some proportion of all-0 response patterns (sampling 0s) — that is, some of the 160 all-0 response patterns come from the IRT model. We will refer to the individuals whose responses come from this standard IRT model as belonging to Class 1. The remaining all-0 response patterns belong to a latent class that is degenerate, in that all members of this class respond 0 to each item with probability 1 (structural 0s). We will call this Class 0. It is the all-0 responses from Class 0 that are responsible for the zero inflation.
In our code, we use an indicator variable dat_bin[I], such that dat_bin[i] == 1 if the respondent endorses 0 for all 9 items. These individuals could belong to either Class 0 or Class 1. If dat_bin[i] == 0, the individual must belong to Class 1. The parameter theta_zi is the proportion of the entire sample (N = 500, all response patterns) that is estimated to belong to Class 0. It is not possible to identify which all-0 respondents belong to Class 0 vs. Class 1; however, we should be able to estimate the proportion of the entire sample that belongs to each class (theta_zi = 0.08 in our simulated data). We are able to do this with ML. When we use the attached code, the mean, sd, and credible interval for theta_zi all come to out be exactly 0. This suggests that none of the all-0 response patterns are being treated as structural 0s and that the zero-inflated part of the mixture model is being ignored.
This type of test zero inflation comes from this paper " IRT Modeling in the Presence of Zero-Inflation With Application to Psychiatric Disorder Severity", which does the test zero inflation 2PL model with ML in Mplus 0146621615588184.pdf (750.0 KB)
Here is the Stan model, and I am attaching simulated data TISimData.csv (9.3 KB) set , and full R code to replicate the issue
TestInflation.R (2.3 KB)
Any help is appreciated on how to specify the multivariate test inflation
Thank you
data{
int n; //number of subjects
int p; // number of items
int K; // number of categories
int x[n,p]; // categorical data matrix
int x_bin[n]; // indicator variable for all-0 responses
}
parameters{
real<lower=0, upper=5> alpha[p]; // discrimination for each item
ordered[K-1] kappa[p]; // category thresholds for each item
real theta[n]; // factor scores
real<lower=0, upper=1> theta_zi; // test zero inflation
real mu_kappa; //mean of the prior distribution of thresholds
real<lower=0> sigma_kappa; //sd of the prior distribution of thresholds
real<lower=0> sigma_alpha; //sd of the prior distribution of discrimination
}
transformed parameters{
vector[K-1] thresholds[p];
for(i in 1:p){
for(k in 1:(K-1)){
thresholds[i,k] = kappa[i,k]/alpha[i]; // IRT threshold parameterization
}
}
}
model{
for(i in 1:n){
for(j in 1:p){
if (x_bin[i] == 1){
target += log_sum_exp(bernoulli_lpmf(1 | theta_zi),
bernoulli_lpmf(0 | theta_zi) + ordered_logistic_lpmf(x[i,j] | alpha[j] * theta[i], kappa[j]));
}else{
target += bernoulli_lpmf(0 | theta_zi) + ordered_logistic_lpmf(x[i,j] | alpha[j] * theta[i], kappa[j] );
}
}
}
for(i in 1:p){
for(k in 1:(K-1)){
kappa[i,k] ~ normal(mu_kappa, sigma_kappa);
}
}
mu_kappa ~ normal(0,5);
sigma_kappa ~ cauchy(0,5);
alpha ~ lognormal(0, sigma_alpha);
sigma_alpha ~ cauchy(0,5);
theta ~ normal(0,1);
}