IRT model with varying number of response categories

I have a problem related to a previous thread here on the forums (here). Namely, how to fit an IRT model (GRM) when items have a varying number of response categories. The previous thread does not answer the question, hence I am seeking for some help.

Here is my current model. It works fine when all items have the same number of response categories (e.g., all items are dichotomous, or all items have 4 response categories). Instead of using a matrix of observed scores like the other thread, I vectorized y in order to be able to remove all NAs (no need to impute these):

data{
  int<lower=1> C;                           // number of persons
  int<lower=1> I;                           // number of items
  int<lower=1> K;                           // number of response categories
  int<lower=1> N_obs;                       // number obs. scores (no NAs)
  array[N_obs] int<lower=1, upper=C> cc;    // person index
  array[N_obs] int<lower=1, upper=I> ii;    // item index
  array[N_obs] int<lower=1, upper=K> y_obs; // observed item scores
}

parameters {
  vector[C] theta;            // person abilities
  real<lower=0> alpha[I];     // item discriminations
  ordered[K-1] kappa[I];      // response category thresholds
  real mu_kappa;              // hyper-parameter
  real<lower=0> sigma_kappa;  // hyper-parameter
}

model{
  // Omitting priors on theta, alpha, mu_kappa, sigma_kappa.
  for (i in 1:I){
    for (k in 1:(K-1)) {
      kappa[i, k] ~ normal(mu_kappa, sigma_kappa);
    }
  }
  for (n in 1:N_obs) {
    y_obs[n] ~ ordered_logistic(alpha[ii[n]] * theta[cc[n]], kappa[ii[n]]);
  }
}

I need to extend the model above to when K is not a constant. Say, I have four items, two dichotomous and two with 4 response categories. Then K is vector (2, 2, 4, 4). How can I rewrite the program? Here is what I tried:

  • Declare K in the data block as a vector (array[I] int<lower=1> K;). But this breaks kappa in the parameters block.
  • I looked on how to set up ragged arrays in Stan. From what I could find, this is not allowed in Stan. I did find this, but I don’t know how to implement this (especially in the parameters block).

Can someone point me in the right direction?
Thank you.

I only know the binary (correct/incorrect) IRT models. Do you have the model written down mathematically to know what it means when there are different numbers of items? How do you interpret the student’s ability and the question’s difficulty? Is there discrimination? That is, is this an analogue of the IRT 1PL or 2PL model? Is the idea to just use the same linear predictors for different numbers of ordinal outcomes? That’s only going to make sense if the orderings are the same.

What you’ll need for this is two different sets of cutpoints—one for the binary case and one for the four-way case. The binary case can probably be simplified out of being an ordinal logit while you’re at it.

They’re not supported directly, but the following user’s guide chapter shows how to code them up using regular arrays:

https://mc-stan.org/docs/stan-users-guide/sparse-ragged.html#ragged-data-structs.section

The GRM is an extension of the 2PL model. Say we have C individuals (c=1,\ldots,C) and I items (i=1,\ldots,I). Consider the i-th item to have K ordinal response categories. We define (K-1) thresholds for the i-th item, one threshold between each two consecutive response categories. Let’s denote the threshold parameters by \delta_{ik} for k=1,\ldots,K-1. The probability of individual c answering at or above response category k for the i-th item is given by P^*_{cik}=\frac{\exp(a_i\theta_c-\delta_{ik})}{1+\exp(a_i\theta_c-\delta_{ik})} (with P^*_{ci1}=1 and P^*_{ci(K-1)}=0). The probability of individual c answering at response category k is given by P_{cik}=P^*_{cik}-P^*_{ci(k+1)}.

Each individual’s ability is estimated through the joint likelihood using all items with possibly different number of answer options. The a_i parameters are the discriminations.

At the moment I tried what you suggested (setting up two different sets of cutpoints). What I did was essentially double the data and parameters, one set of each for the dichotomous items and the other set for the polytomous items (all with 5 answer options, luckily). Then the model boils down to essentially this:

(...)
for (n in 1:N_obs2) {
    y_obs2[n] ~ ordered_logistic(alpha2[ii2[n]] * theta[cc2[n]], kappa2[ii2[n]]);
  }
  for (n in 1:N_obs5) {
    y_obs5[n] ~ ordered_logistic(alpha5[ii5[n]] * theta[cc5[n]], kappa5[ii5[n]]);
  }

This looks very cumbersome to me, but it does seem to work. I am a bit wary about some of the thresholds being a bit too small or large (compared to the fit of the frequentist model through the ltm package). Is this what you meant?

As for the ragged arrays, the link you provided is the same as the one I gave in the OP.

Thanks a lot for helping me out, let’s see whether you can advice me further.

Thanks for the explanation.

Sorry Stan isn’t more elegant! I think you can do this, though to help somewhat, which is to convert

for (n in 1:N_obs2) {
  y_obs2[n] ~ ordered_logistic(alpha2[ii2[n]] * theta[cc2[n]], kappa2[ii2[n]]);
}

to

y_obs2  ~ ordered_logistic(alpha2[ii2] .* theta[cc2], kapp2[ii2]);

I just wanted to make sure they’re on the same scale. For example, 0/1 responses for bad food/good food are not consistent with 1/2/3/4/5 responses terrible ambience/bad ambience/OK ambience/good ambience/great ambience.

Ack. I lied. We haven’t yet vectorized the ordered logistic, so it has to be done in a loop. :-(

Thanks a lot Bob.

Btw, by ‘cumbersome’ I meant ‘I’m really not sure I did this the right way’.
Stan itself is absolutely incredible!

OK, I will proceed from here as it is. I’ll make sure to generate predictions from the model to see how reasonable these look like.

I’ve played around with this in the past. The solution I came up with (which I can’t claim is efficient or problem-free) is to manually create one large vector of all the thresholds (i.e. for all variables) from an unconstrained vector of values, a vector of location parameters, and a vector of scale parameters. You can then index the threshold vector based on the item (as suggested by @Bob_Carpenter above re how to handle ragged arrays, though here subsetting the thresholds rather than the observed data array). I’ll see if I can find my old code later, but here’s the general idea.

For one variable with K categories, you need K parameters

  1. An unconstrained vector of length K-2, \theta \in (-\infty,\infty)
  2. A location parameter \alpha \in [0,1]
  3. A scale parameter \sigma \in (0,\infty)

You can create a K-2 simplex from \theta using the softmax function, which you can use to make an ordered vector \tau \in [0,1] by taking the cumulative sum of the simplex values, where the first element is 0. You can then create your thresholds by shifting \tau by \alpha and scaling by \sigma. (\kappa = (\tau - \alpha) \times \sigma). This gives you an ordered simplex of length K-1, \kappa \in (-\infty, \infty).

Extending to multiple variables with varying values of K is fairly straight-forward. You stack combine the parameters for all variables into a single vector for \theta, one for \alpha, and one for \sigma. You can repeat the process described above for each variable, creating a single vector of thresholds \kappa which are ordered for each variable. Then, you can use indices to ensure the correct thresholds are used for each variable.

functions{
    vector threshold_maker(vector theta, vector alpha, vector sigma, array[] int Ks, array[] int Ke){
        ...
    }
}
data{
    ...
   int<lower = K*2> NK;  // Total number of levels across all variables
   array[I] int<lower = 1, upper = NK-I> K_start; // Index for first threshold
   array[I] int<lower = 1, upper = NK-I> K_end; // Index for final threshold
}
parameters{
    ....
    vector[NK - I * 2] theta;
    vector<lower = 0, upper = 1>[I] alpha;
    vector<lower = 0>[I] sigma;
}
transformed parameters{
    ....
    vector[NK - I] kappa = threshold_maker(theta, alpha, sigma, K_start, K_end);
}
model{
    ....
    for (n in 1:N_obs) {
       int i = ii[n];   // Item index
       y_obs[n] ~ ordered_logistic(alpha[i] * theta[cc[n]], kappa[K_start[i]:K_end[i]);
    }
}

I’ll see if I can code up a simple version of this later today. In the meantime, you might also see how Stan creates ordered vectors behind the scenes. Replicating that process might be more efficient than the version I have described above.

1 Like

Related to the above post, I find the reference manual page on constraint transforms to be really helpful, starting at the “ordered vector” section. The priors on the threshold parameters become especially tricky, because you also need to consider the Jacobian. I tried to describe some intuition about it in the “order constraints” section of this paper.

1 Like