Ordinal IRT model

Hi all,

I am running into trouble with convergence, with an ordinal IRT model (this thread is about the same model). I am trying to model public concern about climate change in different countries over time, based on survey data. The data looks like this:

country----question----option----resp. who picked option----total no. of respondents
Belgium—A--------------1----------345------------------------------1000
Belgium—A--------------2----------278------------------------------1000
etc.

So, for each country in each year, for each survey question in the data set, I know how many respondents picked each of the answer options. Questions could be on a scale from 1-4, or 1-10, or 1-2, or whatever. Each question is coded in such a way that lower responses mean greater concern about climate change. Data has 9000 rows and is based on 2.5 million responses (i.e., column 4 sums to 2.5 million).

The model consists of:

  • A random walk that describes the evolution of climate concern on each continent, plus a random walk for each country’s individual evolution. The country-year “ability” terms (alpha) that go into the IRT part are the sum of these two.
  • An ordered IRT model connecting abilities to respondents’ choices of answer options through an ordered logit model. It is similar to a 2PL IRT, with a difficulty (beta) and a discrimination (gamma) parameter. However, as the data is ordinal, the probability of picking an answer option is determined by comparing ability (alpha) to a set of thresholds.

I am pasting the full model below. Priors are (weakly) informative because earlier versions of this model only converged with such priors (see point 1. below for an example). With the data above, I don’t get any convergence at all–Rhats in the millions, effective sample sizes equal 2.002 for all parameters regardless of chain length. All chains look completely stuck in place. (happy to share specific outputs if it helps).

I have tried the following things:

  1. Binarizing the data by splitting each question’s scale in half. So now, all respondents who picked an option on the left side of the scale score 1, on the right hand side it’s 2. This works–the model converges.
  2. Binarizing all of the data except for two questions, for which I kept data about three answer options. No convergence.
  3. Fixing various parameters in the model instead of putting priors on them. I tried this for mu_beta, thres_scale (IRT parameters), lambda_epsilon, lambda_A and sigma_epsilon (parameters governing the variance in the ability terms). No convergence.
  4. Leaving out the continent random walk part. No convergence.
  5. Identifying the model by fixing the location and scale of the IRT parameters instead of the ability parameters (the model below is identified by fixing the standard deviation of the continent random walks to 1 and the mean of all random walks to 0). No convergence.
  6. Running the model on simulated data, where parameters are drawn from their priors. No convergence, even when I generate twice as much data as I actually have.

I am especially puzzled by what happens between 1. and 2. Apparently, even having just a few non-binary questions in the data breaks the model completely. What could be happening?

Thanks in advance,

Clara

 /**
 * Ordinal latent trait model for climate opinion with continent and country random walks
 */

data {

  int<lower=0> C; //number of countries
  int<lower=0> TT; //number of years
  int<lower=0> J;  //number of questions
  int<lower=0> G; //number of continents
  int<lower=0> CTJP; //number of observed country-year-question-options

  int<lower=0> y[CTJP]; //responses per observation (country-year-question-option)
  int<lower=0> p[CTJP]; //options of the observations
  int<lower=0> countries[CTJP]; //countries of the observations
  int<lower=0> years[CTJP]; //years of the observations
  int<lower=0> questions[CTJP]; //questions of the observations
  int<lower=0> c_continents[C]; //continents of the countries

  int<lower=0> P; //total number of thresholds
  int<lower=0> no_of_thres[J]; //number of thresholds for each question (no. of options - 1)
  int<lower=1> thres_start[J]; //index of starting point for question's thresholds

}


parameters {

  real A[G,TT]; //abilities, continent term
  real epsilon[C,TT]; //abilities, country term

  vector<lower=0>[P-J] steps; //steps between question thresholds
  real<lower=0> thres_scale[J]; //scale of the thresholds; should be positive

  real mu_beta; //mean difficulty
  real beta[J]; //difficulty
  real<lower=0> gamma[J]; //discrimination; should be positive

  //random walk step sizes and initial variation
  real<lower=0> lambda_A;
  real<lower=0> lambda_epsilon;
  real<lower=0> sigma_epsilon;

}


transformed parameters {

  real kappa; //combined ability standardization term
  real alpha[C,TT]; //abilities, combined term

  vector[P] thres; //question thresholds

  //standardization: last period alphas should have sd=1 a priori
  kappa = sqrt( lambda_A^2 + (TT - 1) + sigma_epsilon^2 * (lambda_epsilon^2 + TT - 1) );

  //combine continent and country terms into alphas
  for(c in 1:C){
    for(t in 1:TT){
      alpha[c,t] = ( A[ c_continents[c],t ] + epsilon[c,t] ) / kappa ;
    }
  }

  //reconstruct thresholds from step sizes and scale
  {
    int pos_steps  = 1;
    int pos_thres = 1;
    for(j in 1:J){ //for each question

      thres[pos_thres] = beta[j]; //set first threshold to difficulty
      if(no_of_thres[j]>1){
         for(i in 1:(no_of_thres[j]-1) ){ //set each next threshold...
           //...to previous thresh. plus step, scaled
           thres[pos_thres+i] = thres[pos_thres+i-1] + (steps[pos_steps+i-1] * thres_scale[j]);
         }
      }

      //thres_start[j] = pos_thres; //store starting point of thresholds for question
      pos_thres = pos_thres + no_of_thres[j]; //move index of threshold vector
      pos_steps = pos_steps + no_of_thres[j] - 1; //move index of steps vector

    }
  }

}


model {

  //continent random walk priors
  A[,1] ~ normal(0, lambda_A);
  for(t in 2:TT){
    A[,t] ~ normal(A[,t-1], 1);
  }

  //country random walk priors
  epsilon[,1] ~ normal(0, lambda_epsilon*sigma_epsilon);
  for(t in 2:TT){
    epsilon[,t] ~ normal(epsilon[,t-1], sigma_epsilon);
  }

  //informative priors for step sizes and starting point variation
  lambda_A ~ normal(5, 5);
  lambda_epsilon ~ normal(5, 5);
  sigma_epsilon ~ weibull(1.25, 1.75);

  //informative priors for question thresholds
  thres_scale ~ lognormal(1, .5);
  {
    int pos = 1;
    for(j in 1:J){ //for each question
      if(no_of_thres[j]>1){
         segment(steps, pos, no_of_thres[j]-1) ~ dirichlet(rep_vector(1, no_of_thres[j]-1));
      }
      pos = pos + no_of_thres[j]-1;
    }
  }

  //informative priors for question difficulty and sensitivity
  mu_beta ~ normal(-1.5, .25); //given thres_scale ~ lognormal(1, .5), this puts expected threshold midpoint at 0
  beta ~ normal(mu_beta, .5);
  gamma ~ normal(0, .5);

  //loglikelihood
  for(ctjp in 1:CTJP){
     //y[ctjp]: number of times we observed option being chosen
     //p[ctjp]: option number
     //alpha[ countries[ctjp] , years[ctjp] ]: country-year concern score
     //segment(thres, ...): question thresholds
     //gamma[j]: question discrimination
     int j = questions[ctjp];
     target += y[ctjp] * ordered_logistic_lpmf( p[ctjp] | gamma[j] *  alpha[ countries[ctjp] , years[ctjp] ] ,  gamma[j] * segment(thres, thres_start[j], no_of_thres[j]) );
  }

}

Have you tried using (some of) the parameter estimates from a model that
does converge as the initial values for parameters? That helped me in a
similar situation.

The problem with these IRT models is typically identifiability.

What does the posterior draws look like when there’s not convergence?

Do you have priors for all of your parameters? Are the locations and scales identified somehow?

You also probably want to use a non-centered parameterization for beta (assuming mu_beta is a parameter, and not data). And you definitely want a non-centered parameterization for A if that’s a parameter.

Thank you both for your ideas. Allen, I am trying out your idea about initial values right now.

Bob, in response to your questions:

  1. I am attaching some select traceplots to show posterior draws for the non-converging model. As you can see, all of the chains are stuck in place (there actually are tiny movements in each chain).

  2. I am not sure I understand what you mean by whether I have priors for all parameters. Would a stan model run if there were prior-less parameters? My identification strategy is through a prior on the abilities, as shown in the example multilevel 2PL model (sections 9.11 and 9.12) in the manual.

  • location: the starting points of the country and continent random walks have priors centered around 0. That means alpha, the ability parameter, also has a prior centered around 0.
  • scale: alpha is transformed to have an a priori sd of 1 in the last period. (see the lines commented with “standardization: last period alphas should have sd=1 a priori” and “combine continent and country terms into alphas”)

I tried the following things (without success in getting convergence) to investigate whether the problem was with identification:

  • Fixing, rather than having priors for, lambda_epsilon, lambda_A and sigma_epsilon (parameters governing the variance in the ability terms).

  • Identifying the model by fixing the location and scale of the difficulty parameters instead of the ability parameters. That is, I transformed the difficulty (beta) parameters to make sure they had mean 0 and sd 1. And instead of having the “sd 1 in last period” transformation on alpha, I just put weakly informative priors on the parameters governing the variance in abilities.

  1. mu_beta is indeed a parameter. I am reading up on non-centered parametrization but still confused about what it is, why it works, how it applies here. Any good resources you can point me to?

Thanks again,

Clara

traceplots.pdf (54.4 KB)

Following up on Allen’s suggestion, I first ran the model on the binarized data. Then I fed it the posterior means of that run, as initial values for a run with non-binarized data.

What happens is that I get signs of convergence for all but the following parameters, which have completely stationary chains:

  • thres, or more specifically those elements of thres that refer to the thresholds between the second and third, third and fourth, fourth and fifth… options. Convergence looks ok for the elements of thres that refer to the threshold between the first and second option (in my model, those are equal to beta, or the difficulty parameter).
  • thres_scale, a parameter that indicates how far apart the thresholds are from one another

These are exactly the parameters which cannot be estimated from binarized data. In other words, the model seems to be converging for all of the parameters, except the ones for which I did not feed it starting values.

I will think about what this means and report back. Meanwhile, more ideas are welcome.

Maybe. Stna is OK with improper priors, but it requires proper posteriors. One way to ensure proper posteriors is to have priors for all the parameters.

Are you getting other messages?

That model’s too complicated for me to understand what you’re trying to do, but I’d recommend trying to debug.

You’ll also find using a non-centered parameterization will work much better for the random walk priors you have.

Hi everyone,

Thanks again for your help–I ended up getting convergence after adjusting the priors on my “steps” parameter, which determines how far the cutpoints of the ordinal logistic PMF will be removed from one another. The model is:

/**
 * Ordinal latent trait model for climate opinion with continent and country random walks
 */

data {

  int<lower=0> C; //number of countries
  int<lower=0> TT; //number of years
  int<lower=0> J;  //number of questions
  int<lower=0> G; //number of continents
  int<lower=0> CTJP; //number of observed country-year-question-options

  int<lower=0> y[CTJP]; //responses per observation (country-year-question-option)
  int<lower=0> p[CTJP]; //options of the observations
  int<lower=0> countries[CTJP]; //countries of the observations
  int<lower=0> years[CTJP]; //years of the observations
  int<lower=0> questions[CTJP]; //questions of the observations
  int<lower=0> c_continents[C]; //continents of the countries

  int<lower=0> P; //total number of thresholds
  int<lower=0> no_of_thres[J]; //number of thresholds for each question (no. of options - 1)
  int<lower=1> thres_start[J]; //index of starting point for question's thresholds

}


parameters {

  real A[G,TT]; //abilities, continent term
  real epsilon[C,TT]; //abilities, country term

  vector<lower=0>[P-J] steps; //steps between question thresholds
  real<lower=0> thres_scale[J]; //scale of the thresholds; should be positive

  real beta[J]; //difficulty
  real<lower=0> gamma[J]; //discrimination; should be positive

  //random walk step sizes and initial variation
  real<lower=0> lambda_A;
  real<lower=0> lambda_epsilon;
  real<lower=0> sigma_epsilon;

}


transformed parameters {

  real kappa; //combined ability standardization term
  real alpha[C,TT]; //abilities, combined term

  vector[P] thres; //question thresholds

  //standardization: last period alphas should have sd=1 a priori
  kappa = sqrt( lambda_A^2 + (TT - 1) + sigma_epsilon^2 * (lambda_epsilon^2 + TT - 1) );

  //combine continent and country terms into alphas
  for(c in 1:C){
    for(t in 1:TT){
      alpha[c,t] = ( A[ c_continents[c],t ] + epsilon[c,t] ) / kappa ;
    }
  }

  //reconstruct thresholds from step sizes
  {
    int pos_steps  = 1;
    int pos_thres = 1;
    for(j in 1:J){ //for each question

      thres[pos_thres] = beta[j]; //set first threshold to difficulty
      if(no_of_thres[j]>1){
         for(i in 1:(no_of_thres[j]-1) ){ //set each next threshold...
           //...to previous thresh. plus step
           thres[pos_thres+i] = thres[pos_thres+i-1] + (steps[pos_steps+i-1] * thres_scale[j] / (no_of_thres[j]-1));
         }
      }

      //thres_start[j] = pos_thres; //store starting point of thresholds for question
      pos_thres = pos_thres + no_of_thres[j]; //move index of threshold vector
      pos_steps = pos_steps + no_of_thres[j] - 1; //move index of steps vector

    }
  }

}


model {

  //continent random walk priors
  A[,1] ~ normal(0, lambda_A);
  for(t in 2:TT){
    A[,t] ~ normal(A[,t-1], 1);
  }

  //country random walk priors
  epsilon[,1] ~ normal(0, lambda_epsilon*sigma_epsilon);
  for(t in 2:TT){
    epsilon[,t] ~ normal(epsilon[,t-1], sigma_epsilon);
  }

  //informative priors for step sizes and starting point variation
  lambda_A ~ normal(5, 5);
  lambda_epsilon ~ normal(5, 5);
  sigma_epsilon ~ weibull(1.25, 1.75);

  //informative priors for question thresholds
  thres_scale ~ lognormal(1, .5);
  {
    int pos = 1;
    for(j in 1:J){ //for each question
      if(no_of_thres[j]>1){
         segment(steps, pos, no_of_thres[j]-1) ~ normal(thres_scale[j] / no_of_thres[j], 1);
      }
      pos = pos + no_of_thres[j]-1;
    }
  }

  //informative priors for question difficulty and sensitivity
  beta ~ normal(0, 1);
  gamma ~ normal(0, .5);

  //loglikelihood
  for(ctjp in 1:CTJP){
     //y[ctjp]: number of times we observed option being chosen
     //p[ctjp]: option number
     //alpha[ countries[ctjp] , years[ctjp] ]: country-year concern score
     //segment(thres, ...): question thresholds
     //gamma[j]: question discrimination
     int j = questions[ctjp];
     target += y[ctjp] * ordered_logistic_lpmf( p[ctjp] | gamma[j] *  alpha[ countries[ctjp] , years[ctjp] ] ,  gamma[j] * segment(thres, thres_start[j], no_of_thres[j]) );
  }

}

Thanks for reporting back.

As Bob predicted, I also got much better convergence with non-centered parametrization.

The following model resulted in Rhats of <1.025 for all parameters after 2000 iterations (1000 warmup, 1000 sampling). The version above still had Rhats > 1.4 for some of the parameters at that point. I simplified the model a bit, but the main difference here is the parametrization.

/**
 * Ordinal latent trait model for climate opinion with continent and country random walks
 */

data {

  int<lower=0> C; //number of countries
  int<lower=0> TT; //number of years
  int<lower=0> J;  //number of questions
  int<lower=0> G; //number of continents
  int<lower=0> CTJP; //number of observed country-year-question-options

  int<lower=0> y[CTJP]; //responses per observation (country-year-question-option)
  int<lower=0> p[CTJP]; //options of the observations
  int<lower=0> countries[CTJP]; //countries of the observations
  int<lower=0> years[CTJP]; //years of the observations
  int<lower=0> questions[CTJP]; //questions of the observations
  int<lower=0> c_continents[C]; //continents of the countries

  int<lower=0> P; //total number of thresholds
  int<lower=0> no_of_thres[J]; //number of thresholds for each question (no. of options - 1)
  int<lower=1> thres_start[J]; //index of starting point for question's thresholds

}


parameters {

  matrix[C,TT] epsilon; //RW steps for abilities

  vector<lower=0>[P-J] steps; //steps between question thresholds

  real beta[J]; //difficulty
  real<lower=0> gamma[J]; //discrimination; should be positive

  //random walk initial variation
  real<lower=0> lambda_epsilon;

}


transformed parameters {

  real alpha[C,TT]; //abilities, combined term

  vector[P] thres; //question thresholds

  //reconstruct abilities from step sizes
  for(c in 1:C){
    alpha[c,1] = epsilon[c,1] * lambda_epsilon ;
    for(t in 2:TT){
      alpha[c,t] = alpha[c,t-1] + ( epsilon[c,t] / (TT-1) );
    }
  }

  //reconstruct thresholds from step sizes
  {

    int pos_steps  = 1;
    int pos_thres = 1;
    for(j in 1:J){ //for each question

      thres[pos_thres] = beta[j]; //set first threshold to difficulty
      if(no_of_thres[j]>1){
         for(i in 1:(no_of_thres[j]-1) ){ //set each next threshold...
           //...to previous thresh. plus step
           thres[pos_thres+i] = thres[pos_thres+i-1] + ( steps[pos_steps+i-1] / (no_of_thres[j]-1) ) ;
         }
      }

      pos_thres = pos_thres + no_of_thres[j]; //move index of threshold vector
      pos_steps = pos_steps + no_of_thres[j] - 1; //move index of steps vector

    }
  }

}


model {

  //priors for country random walk steps
  for(t in 1:TT){
    epsilon[,t] ~ normal(0, 1);
  }

  //informative prior for RW starting point variation
  lambda_epsilon ~ lognormal(0, 1);

  //informative priors for question thresholds
  steps ~ normal(0, 1);

  //informative priors for question difficulty and sensitivity
  beta ~ normal(0, 1);
  gamma ~ normal(0, .5);

  //loglikelihood
  for(ctjp in 1:CTJP){
     //y[ctjp]: number of times we observed option being chosen
     //p[ctjp]: option number
     //alpha[ countries[ctjp] , years[ctjp] ]: country-year concern score
     //segment(thres, ...): question thresholds
     //gamma[j]: question discrimination
     int j = questions[ctjp];
     target += y[ctjp] * ordered_logistic_lpmf( p[ctjp] | gamma[j] * alpha[ countries[ctjp] , years[ctjp] ] ,  gamma[j] * segment(thres, thres_start[j], no_of_thres[j]) );
  }

}
1 Like

Thanks for reporting back on the new parameterization. It’s also a nice example of using sample count times probability in the target to avoid recomputing terms.