Hi all,
I am running into trouble with convergence, with an ordinal IRT model (this thread is about the same model). I am trying to model public concern about climate change in different countries over time, based on survey data. The data looks like this:
country----question----option----resp. who picked option----total no. of respondents
Belgium—A--------------1----------345------------------------------1000
Belgium—A--------------2----------278------------------------------1000
etc.
So, for each country in each year, for each survey question in the data set, I know how many respondents picked each of the answer options. Questions could be on a scale from 1-4, or 1-10, or 1-2, or whatever. Each question is coded in such a way that lower responses mean greater concern about climate change. Data has 9000 rows and is based on 2.5 million responses (i.e., column 4 sums to 2.5 million).
The model consists of:
- A random walk that describes the evolution of climate concern on each continent, plus a random walk for each country’s individual evolution. The country-year “ability” terms (alpha) that go into the IRT part are the sum of these two.
- An ordered IRT model connecting abilities to respondents’ choices of answer options through an ordered logit model. It is similar to a 2PL IRT, with a difficulty (beta) and a discrimination (gamma) parameter. However, as the data is ordinal, the probability of picking an answer option is determined by comparing ability (alpha) to a set of thresholds.
I am pasting the full model below. Priors are (weakly) informative because earlier versions of this model only converged with such priors (see point 1. below for an example). With the data above, I don’t get any convergence at all–Rhats in the millions, effective sample sizes equal 2.002 for all parameters regardless of chain length. All chains look completely stuck in place. (happy to share specific outputs if it helps).
I have tried the following things:
- Binarizing the data by splitting each question’s scale in half. So now, all respondents who picked an option on the left side of the scale score 1, on the right hand side it’s 2. This works–the model converges.
- Binarizing all of the data except for two questions, for which I kept data about three answer options. No convergence.
- Fixing various parameters in the model instead of putting priors on them. I tried this for mu_beta, thres_scale (IRT parameters), lambda_epsilon, lambda_A and sigma_epsilon (parameters governing the variance in the ability terms). No convergence.
- Leaving out the continent random walk part. No convergence.
- Identifying the model by fixing the location and scale of the IRT parameters instead of the ability parameters (the model below is identified by fixing the standard deviation of the continent random walks to 1 and the mean of all random walks to 0). No convergence.
- Running the model on simulated data, where parameters are drawn from their priors. No convergence, even when I generate twice as much data as I actually have.
I am especially puzzled by what happens between 1. and 2. Apparently, even having just a few non-binary questions in the data breaks the model completely. What could be happening?
Thanks in advance,
Clara
/**
* Ordinal latent trait model for climate opinion with continent and country random walks
*/
data {
int<lower=0> C; //number of countries
int<lower=0> TT; //number of years
int<lower=0> J; //number of questions
int<lower=0> G; //number of continents
int<lower=0> CTJP; //number of observed country-year-question-options
int<lower=0> y[CTJP]; //responses per observation (country-year-question-option)
int<lower=0> p[CTJP]; //options of the observations
int<lower=0> countries[CTJP]; //countries of the observations
int<lower=0> years[CTJP]; //years of the observations
int<lower=0> questions[CTJP]; //questions of the observations
int<lower=0> c_continents[C]; //continents of the countries
int<lower=0> P; //total number of thresholds
int<lower=0> no_of_thres[J]; //number of thresholds for each question (no. of options - 1)
int<lower=1> thres_start[J]; //index of starting point for question's thresholds
}
parameters {
real A[G,TT]; //abilities, continent term
real epsilon[C,TT]; //abilities, country term
vector<lower=0>[P-J] steps; //steps between question thresholds
real<lower=0> thres_scale[J]; //scale of the thresholds; should be positive
real mu_beta; //mean difficulty
real beta[J]; //difficulty
real<lower=0> gamma[J]; //discrimination; should be positive
//random walk step sizes and initial variation
real<lower=0> lambda_A;
real<lower=0> lambda_epsilon;
real<lower=0> sigma_epsilon;
}
transformed parameters {
real kappa; //combined ability standardization term
real alpha[C,TT]; //abilities, combined term
vector[P] thres; //question thresholds
//standardization: last period alphas should have sd=1 a priori
kappa = sqrt( lambda_A^2 + (TT - 1) + sigma_epsilon^2 * (lambda_epsilon^2 + TT - 1) );
//combine continent and country terms into alphas
for(c in 1:C){
for(t in 1:TT){
alpha[c,t] = ( A[ c_continents[c],t ] + epsilon[c,t] ) / kappa ;
}
}
//reconstruct thresholds from step sizes and scale
{
int pos_steps = 1;
int pos_thres = 1;
for(j in 1:J){ //for each question
thres[pos_thres] = beta[j]; //set first threshold to difficulty
if(no_of_thres[j]>1){
for(i in 1:(no_of_thres[j]-1) ){ //set each next threshold...
//...to previous thresh. plus step, scaled
thres[pos_thres+i] = thres[pos_thres+i-1] + (steps[pos_steps+i-1] * thres_scale[j]);
}
}
//thres_start[j] = pos_thres; //store starting point of thresholds for question
pos_thres = pos_thres + no_of_thres[j]; //move index of threshold vector
pos_steps = pos_steps + no_of_thres[j] - 1; //move index of steps vector
}
}
}
model {
//continent random walk priors
A[,1] ~ normal(0, lambda_A);
for(t in 2:TT){
A[,t] ~ normal(A[,t-1], 1);
}
//country random walk priors
epsilon[,1] ~ normal(0, lambda_epsilon*sigma_epsilon);
for(t in 2:TT){
epsilon[,t] ~ normal(epsilon[,t-1], sigma_epsilon);
}
//informative priors for step sizes and starting point variation
lambda_A ~ normal(5, 5);
lambda_epsilon ~ normal(5, 5);
sigma_epsilon ~ weibull(1.25, 1.75);
//informative priors for question thresholds
thres_scale ~ lognormal(1, .5);
{
int pos = 1;
for(j in 1:J){ //for each question
if(no_of_thres[j]>1){
segment(steps, pos, no_of_thres[j]-1) ~ dirichlet(rep_vector(1, no_of_thres[j]-1));
}
pos = pos + no_of_thres[j]-1;
}
}
//informative priors for question difficulty and sensitivity
mu_beta ~ normal(-1.5, .25); //given thres_scale ~ lognormal(1, .5), this puts expected threshold midpoint at 0
beta ~ normal(mu_beta, .5);
gamma ~ normal(0, .5);
//loglikelihood
for(ctjp in 1:CTJP){
//y[ctjp]: number of times we observed option being chosen
//p[ctjp]: option number
//alpha[ countries[ctjp] , years[ctjp] ]: country-year concern score
//segment(thres, ...): question thresholds
//gamma[j]: question discrimination
int j = questions[ctjp];
target += y[ctjp] * ordered_logistic_lpmf( p[ctjp] | gamma[j] * alpha[ countries[ctjp] , years[ctjp] ] , gamma[j] * segment(thres, thres_start[j], no_of_thres[j]) );
}
}