IRT model with random items

Dear all,

I am trying to implement a 2PL IRT model with random effects for the discrimination (\alpha) and difficulty (\beta) parameters. I having trouble fitting the model to simulated data, and I am not sure I have coded the model correctly.

Let me start with my motivation behind the model. I have data on about 20 items for a hundred countries for the last 40 years. So I have country-years are nested on countries. I think the questions are not equally relevant across countries. I want to allow the discrimination and difficulty parameters to vary across countries. In IRT lingo, I want to generalize the usual 2PL setup to this: logit[Pr(y=1)]=\alpha_{item, country}\theta_{country,year}-\beta_{item,country}

I have been looking at the case studies and previous posts to get a sense of how to implement this model. I came up with the code below. I try to use a non-centered parameterization for \alpha to help with divergences, and some constraints on \beta to help with identification. In an effort to start simple, I’m currently just trying to get random effects for \alpha while keeping \beta constant across countries.

data { 
  int<lower=1> N;            // number of observations
  int<lower=1> I;            // number of items 
  int<lower=1> C;            // number of countries 
  int<lower=1> K;            // number of unique random effects
  int<lower=1> J;            // number of country-years 
  int<lower=1,upper=I> ii[N];  // item for response n 
  int<lower=0,upper=C> cc[N];  // country for response n  
  int<lower=0,upper=J> jj[N];  // country-year for response n 
  int<lower=1,upper=K> kk[N];  // random effect for response n
  int<lower=0,upper=1> y[N];   // response n

parameters {
  vector[I-1] beta_free; //freely estimated difficulty parameters
  vector[J] theta;    // Ability parameter for country-year              		
  vector[I] a_mean;   // Mean discriminations for each item 
  vector[K] a_offset; // Discrimination offsets for each country
  vector<lower=0> [I] a_sigma; //St dev. for discrimination offsets 
transformed parameters{
  vector [I] beta; // contains freely estimated and constrained betas
  vector[N] alpha; //Overal discriminations, including random effects 
  vector[N] eta;  // log odds of a correct probability
  beta[1:(I-1)] = beta_free;
  beta[I] = -1*sum(beta_free);  // Constrain  betas to help with identification 
  alpha = a_mean[ii]+ a_offset[kk] .* a_sigma[ii];  // non-centered
  eta = alpha .*theta[jj]-beta[ii];
model { 
  theta ~ normal(0,1);
  target += normal_lpdf(beta | 0, 3);
  a_mean ~ normal(0,1);
  a_offset ~ normal(0,1);
  a_sigma ~  inv_gamma(1,1);
  y ~ bernoulli_logit(eta); //likelihood  

I get lots of divergent iterations when I use a small simulated dataset to test the mode. Increasing adapt_delta to .99 helps a bit but I still get a lot of them. Bigger datasets don’t get divergences but the effective number of samples is really small and most \bar{R} remain above 1.1 despite using 5 chains with 3,000 iterations.

I had a lot of trouble with setting the random effects (line 6 under transformed parameters). I am not sure that I got it right. I tried using a non-centered parametrization but maybe I need to use Cholesky decomposition? Still trying to learn how to do this properly.

I am attaching my R script where I simulate the data. I think I did the simulation right but its my first time trying to simulate this kind of data.

I would appreciate any suggestions!
r script data simulation random2pl.R (2.7 KB)

I didn’t scrutinize your model closely (and I have no experience with IRT), but the general advice is to start with a simple model and add complexity one step at a time and make sure your model still works after each step. Debugging a larger model all at once is often almost impossible. (in your case you could for example start by removing beta and theta and see how the model works with equally simplified simulated data).

Alternatively, it should be possible to fit IRT models with brms ( might be a good starting point). Since brms is generally well tested and does a bunch of clever tricks, you are likely to avoid a huge part of the difficulty in rolling out your own Stan code.

Your item discrimination parameters alpha are not constrained to be positive. In this case the 2PL is not identified.

Not quite sure how to handle that constraint with the non-centered parametrization of alpha though. Maybe try using a centered parametrization first with vector<lower=0>[I] alpha; in the parameters block.

Thank you both for your suggestions.

You were right that the model was not identified. Adding the constraint on alpha didn’t quite do the trick though. According to Fox(2010), I need a few more constraints to identify the model. I rewrote my model following Fox and it seems to work better. However, I am having a hard time coding some of the trickier constraints.

Overall, I need to constrain a vector so it sums to zero for each group in the dataset. I have seen others (@Bob_Carpenter, @LucC ) suggest to constrain a whole vector to sum to zero like this:

parameters {
  vector[K-1] b_free;
transformed parameters {
  vector[K] b;
  b[1:(K-1)] = b_free;
  b[K] = -1*sum(b_free); 

But I haven’t been able to figure out how to do this for each group (i.e. \Sigma_k\beta_{kj}=0). The above method doesn’t cut it because I need to constrain several b and they don’t all come at the end of the vector. Any thoughts on how to implement this?

Well you can always have N_total - N_groups raw variables and store the position last elements of each group and then copy the raw variables directly to “actual” variables, skipping the last elements and then compute the last elements. This would require a bunch of for loops and it would be easy to do some of-by-one errors, but you should be able to get it going relatively easily.

Further, this simple trick is not always well-behaved, but there are some more complex solutions. One is to have a soft constraint, something like:

for(group in 1:N_groups) {
   sum(b[group]) ~ normal(0, 0.01 * N_group_members[group]); 

Or, even better, but more complex is described here: