Distractor Choice IRT Model

Hi,

I am trying to fit a type of nominal response IRT model, the details of the model is at this link

Here is a link for the screen shot of the actual model:

This is an extension of Bock’s original nominal response IRT model by including a guessing parameter. I did try my best to write the model syntax, but it seems not accurate. I keep getting error messages. Below is my try to fit this model.

data{
   int<lower=1, upper=5> K;
   int<lower=0> n_student;
   int<lower=0> n_item;
   int<lower=1,upper=K> Y[n_student,n_item];
 }

parameters {
   matrix[n_item,K] a2; 
   matrix[n_item,K] b2; 
   matrix[n_item,K] c2;
   vector[n_student] theta;
}

transformed parameters {
   matrix[n_item,K] a; 
   matrix[n_item,K] b; 
   matrix[n_item,K] c;

    \\ here I am trying to put a constrain such that the a,b, and c are all equal to
    \\ 0
    \\ for the highest response category, this is how they did in the original paper

    for (j in 1:n_item) {
     for(k in 1:K){
        a[j,k]=a2[j,k]-a2[j,5]; 
        b[j,k]=b2[j,k]-b2[j,5]; 
        c[j,k]=c2[j,k]-c2[j,5]; 
     }
    }
}

model{
   real P[n_student,n_item,K];
   real Pr[n_student,n_item,K];

  theta ~ normal(0,1);

  for (j in 1: n_item){
   for(k in 1:K) {
    a2[j,k] ~ uniform(.5,3);
    b2[j,k] ~ uniform(-3,3);
    c2[j,k] ~ uniform(0,.5);
  }
  }


  for (i in 1:n_student){

    for (j in 1:n_item){
        for (k in 1:K) { 
           P[i,j,k] = ((1-c[j,k])*exp(a[j,k]*(theta[i]-b[j,k])))/(1+(c[j,k]*(exp(a[j,k]*(theta[i]-b[j,k])))));
         }

        for (k in 1:K) {
           Pr[i,j,k] = P[i,j,k]/sum(P[i,j,]);
         }
        Y[i,j] ~ categorical(Pr[i,j,]);
    }
}
}

Below is the error message:

SYNTAX ERROR, MESSAGE(S) FROM PARSER:

No matches for:

real ~ categorical(real[])

Available argument signatures for categorical_logit:

int ~ categorical(vector)
int[] ~ categorical(vector)

require real scalar return type for probability function.

ERROR at line 56

54: }
55:
56: Y[i,j] ~ categorical(Pr[i,j,]);
^
57: }

Error in stanc(file = file, model_code = model_code, model_name = model_name, :
failed to parse Stan model ‘190394253d80ccdcf6735089165b5cde’ due to the above error.

I am also aware that this is not probably the most efficient way as I use so many loops. I saw on other examples that you can vectorize things to reduce the number of loops, but it will take some time for me to understand the Stan language about how arrays work. I also appreciate any help to re-write this more efficiently. I attached a simulated dataset based on this model for your reference.

Thanks all for your time and help.

Cengiz

OK, I am still working on this. It seems I made some progress. Below is the most recent version of my code. It does not give me a syntax error. However, I am now dealing with this error message:

Rejecting initial value:
Error evaluating the log probability at the initial value.

Any help is really appreciated.

Thank you.

cengiz

data{
   int<lower=0> n_student;
   int<lower=0> n_item;
   int<lower=1,upper=5> Y[n_student,n_item];
}

parameters {
   vector[4] a2[n_item]; 
   vector[4] b2[n_item]; 
   vector[4] c2[n_item];
    vector[n_student] theta;
}

transformed parameters {
   vector[5] a[n_item]; 
   vector[5] b[n_item]; 
   vector[5] c[n_item];

  for (j in 1: n_item){
    a[j,1:4]=a2[j,1:4];
    a[j,5] = 0;  

    b[j,1:4]=b2[j,1:4];
    b[j,5] = 0;

    c[j,1:4]=c2[j,1:4];
    c[j,5] = 0;
  }

}

model{

   vector[5] P[n_student,n_item];
   vector[5] Pr[n_student,n_item];

   theta ~ normal(0,1);
   for (j in 1: n_item){
     a2[j] ~ normal(0,1);
     b2[j] ~ normal(0,1);
     c2[j] ~ uniform(0,.5);
   }
 
  for (i in 1:n_student){
    for (j in 1:n_item){
        for (k in 1:5) { 
           P[i,j,k] = ((1-c[j,k])*exp(a[j,k]*(theta[i]-b[j,k])))/(1+(c[j,k]*(exp(a[j,k]*(theta[i]-b[j,k])))));
         }

        for (k in 1:5) {
           Pr[i,j,k] = P[i,j,k]/sum(P[i,j,]);
         }

        Y[i,j] ~ categorical(Pr[i,j,]);
    }
  }
}
# R Code

data_list <- list(K=5,n_student=nrow(d),n_item=ncol(d),Y=d)

dcm <- stan(model_code=code_DCM, data = data_list, iter = 50, chains = 1)

SAMPLING FOR MODEL 'b97caecf75d38252a9ed678c20bc15de' NOW (CHAIN 1).

Rejecting initial value:
  Error evaluating the log probability at the initial value.

Rejecting initial value:
  Error evaluating the log probability at the initial value.

Rejecting initial value:
  Error evaluating the log probability at the initial value.

I hope the message is self explanatory. You are getting NaN or infinite values for your density.

The most likely culprit is not initializing values you have or providing illegal inputs, like categorical values starting at 0 rather than 1.

P.S. You can use a triple back-tick (```) on its own line before and after code to set it off. I wish I knew how to turn off the syntax highlighting as it doesn’t match Stan. I edited your first post to demonstrate. The residual lack of spacing was in the original, so I didn’t modify that.

P.P.S. We recommend putting your Stan programs in their own files so that error message line numbers make sense.

Thanks Bob. I am new to Stan and forum. I appreciate your notes and suggestions.

I am still working on the model. I made more progress. I am now being able to run the model and getting output, but the estimated parameters are still not aligned with the parameters I use to simulate data. I am still probably doing something wrong.

I will post an update if I can figure this out, so it may be here as a future reference for those who may be interested in this model.

cengiz

Great. If you actually sample from the model you’ve defined and feed that back in, then you should get calibrated answers on average. So you want to keep going until you get that right.