2pl IRT / Ideal Point Identification with brms

I am trying to fit a 2pl IRT / Ideal Point model like the one described by Bafumi, Gelman, Park, and Kaplan (2005) http://www.stat.columbia.edu/~gelman/research/published/171.pdf

But I am having some trouble with identification in brms, particularly when I try to model the parameters hierarchically. Here is what I have been playing around with:

formula_2pl <- bf(y ~ gamma * (theta - beta),
                      theta ~ 1 + (1 | id),
                      beta ~ 1 + (1 |i| bill),
                      gamma ~ 1 + (1 |i| bill),
                      nl = TRUE)

prior_2pl <-
  prior("lkj(2)", class = "cor") +
  prior("normal(0, 1)", class = "sd", nlpar = "gamma") +
  prior("cauchy(0, 1)", class = "sd", nlpar = "beta") +
  prior("cauchy(0, 1)", class = "sd", nlpar = "theta") +
  prior("normal(0, 2)", class = "b", nlpar = "gamma") +
  prior("normal(0, 2)", class = "b", nlpar = "beta") +
  prior("normal(0, 2)", class = "b", nlpar = "theta")

fit_2pl <- brm(
  formula = formula_2pl,
  data = ij_level,
  family = brmsfamily("bernoulli", "logit"),
  prior = prior_2pl,
  seed = 666,
  iter = 2000,
  chains = 4,
  cores = 4,
  backend = "cmdstanr",
  adapt_delta = 0.95,
  max_treedepth = 15,
  inits = "0"
)

The main issue seems to manifest itself in very low effective sample sizes for gamma. I have tried to implement some of the solutions from: Low number of effective samples with 2PL latent space model / IRT but I am not sure how to go about it in brms.

I’ve hoped that someone else would comment on this first because, while I do a lot of IRT modeling, I’m not an expert on what is and isn’t going to be identified in Bayesian models. To my eye, the reason this is not identified is related to the multiplication as changing the sign of (theta - beta) could be compensated for by also just changing the sign of gamma: that is, -gamma * +(theta - beta) is no different than +gamma * -(theta - beta) since the sign can be arbitrarily changed (e.g., your prior gives equal probability to -2 and +2 for gamma, so neither sign is uniquely preferred to the other and easily changed).

There is a thorough pre-print on fitting IRT models using brms: here. He has multiple examples of a 2PL model, and there is also a paper applying brms estimated IRT models, including the 2PL model, that includes a very helpful github repository of all the code to extract post-hoc information: here.

In short, I believe that you will need to constrain the discrimination parameter (gamma) to be positive. In the linked papers, this is done by exponentiating discrimination so that it is estimated on the log-scale. You may check the item-total correlations to screen for cases where there will be negative discrimination (i.e., individuals with higher latent trait have lower probability of endorsing an item – this is a negative item-total correlation). You could, for those cases, reverse score the item so that a positive discrimination makes sense again. Item fit statistics should be informative for whether the reverse scoring of “negative” discrimination items improved fit, and you can also always check the item characteristic curves to see whether some discrimination parameters are essentially straight lines (i.e., the model is trying to estimate as close to zero as possible because the true slope is negative but is being constrained to be positive).

Additionally, I’ve played around with different specifications of the IRT model, and I believe that the standard discrimination * (ability - difficulty) parameterization is not identified, even with a prior/model restricting the discrimination parameter to be positive. I don’t know exactly why it fails to estimate in this format, so maybe someone with more of a math background could explain. The models described in the previously linked papers, however, are very stable and get the same results, even if the formula is different than the standard IRT notations.

----- EDIT -----

Since my initial response, a recent post on the forum has gone out for a beta version of a Stan package designed specifically for ideal point models: New R package release: idealstan v1.0 beta (Ideal point/IRT models in Stan). I suspect that this is likely an ideal option for your needs

Thanks for linking to my package, and yes idealstan is made for the IRT ideal point model in the Bafumi paper cited above by @bwilden.

As you mentioned, you can’t fit that model in brms or any IRT package based on the traditional psychometric model because all of the discrimination parameters are constrained to be positive. That makes sense in the original application to test scores (you can’t decrease ability by getting questions right). In other domains, though, it doesn’t, such as any kind of polarizing dynamic with two sides. I also find the ideal point model useful any time there is uncertainty about what the direction of the items is, such as with a new scale.

The idealstan package allows for hierarchical covariates for person/ideal point scores and item discriminations, so it should work for your use case. If you want to learn more about coding (traditional) IRT models in Stan directly, I encourage you to check out the edstan package and accompanying vignettes: https://education-stan.github.io/.

Let me know if you have any questions!

2 Likes

Thanks @wgoette and @saudiwin!

After a lot more experimentation, I eventually landed upon a model that could be fit in brms given that the discrimination parameter is not necessarily positive in this context.

formula_2pl <- bf(position ~ gamma * theta + beta,
                     theta ~ 1 + group_type + (1 | id),
                      beta ~ 1 + party + (1 |i| bill),
                     gamma ~ 1 + party + (1 |i| bill),
                  nl = TRUE)

prior_2pl <-
  prior("lkj(1)", class = "cor") +
  prior("exponential(.5)", class = "sd", nlpar = "gamma") +
  prior("exponential(1)", class = "sd", nlpar = "beta") +
  prior("exponential(.5)", class = "sd", nlpar = "theta") +
  prior("normal(.4, .01)", class = "b", coef = "group_typebusiness", nlpar = "theta") +
  prior("normal(-.4, .01)", class = "b", coef = "group_typelabor", nlpar = "theta") +
  prior("normal(.2, .01)", class = "b", coef = "partyR", nlpar = "beta") +
  prior("normal(-.2, .01)", class = "b", coef = "partyD", nlpar = "beta") +
  prior("normal(.4, .01)", class = "b", coef = "partyR", nlpar = "gamma") +
  prior("normal(-.4, .01)", class = "b", coef = "partyD", nlpar = "gamma") +
  prior("normal(0, .5)", class = "b", nlpar = "gamma") +
  prior("normal(0, 1)", class = "b", nlpar = "beta") +
  prior("normal(0, 2)", class = "b", nlpar = "theta")

fit_sim_2pl <- brm(
  formula = formula_2pl,
  data = ij_level,
  family = brmsfamily("bernoulli", "probit"),
  prior = prior_2pl,
  iter = 2000,
  chains = 4,
  backend = "cmdstanr",
  inits = "0"
)

For a bit more background on my problem—I am trying to infer the left-right ideal points (theta parameter) of interest groups from their signals of support/opposition to legislative bills. The model fit much better when I used some covariates to help predict the thetas, betas, and gammas. Along with this I set really strong priors in certain directions (for example, N(0.4, 0.01) for the coefficient corresponding to the group being associated with business interests, and N(0.2, 0.01) for the coefficient for Republican bill sponsorship). I think ideally I would simply want to constrain these sorts of coefficients to be either positive or negative (depending on background knowledge of their expected direction), but brms apparently doesn’t allow these type of coefficients to have upper or lower bounds.

This model gave me pretty good ESS and Rhat numbers. And it fit without divergent transitions. I am still pretty new to the world of Bayesian modeling though, so I am not sure what some other diagnostic measures I should be checking to ensure reliable inferences. If it helps, the model also almost perfectly recovered the parameter values from simulated data.

But I am really excited to try out idealstan @saudiwin ! It will be interesting to compare any differences in the models, and I’m sure I will learn a lot more.

Where did your strong priors come from? These are some very informative priors to my eyes, which is not inherently incorrect so long as you have good justification for such strong prior beliefs or potentially if you have enough data to shift these beliefs.

IMO, if you are already so confident in what these coefficients and their signs will be, then I’m not really sure what you’re getting out of running a model – especially if you have a relatively small sample where the prior will dominate the posterior. Alternatively, my concern would be that you’ve managed to work with your data and this model long enough to find something that worked but won’t generalize. More information about the simulation would be good too, particularly if you ran the simulation over multiple differing true conditions to see whether the model worked each time.

Yes you have essentially the Bafumi et al. model, but those should be constrained.

In general, I don’t think that constraining hierarchical predictors is usually very useful, esp. as we often want to know the unconstrained values of those predictors. What you should do in the brms code is pin a legislator ideal point or discrimination parameter to fixed values (i.e. -1 and +1). That will allow you to estimate the hierarchical predictor without having to constraint/fix it.

Of course all of this is straightforward in idealstan, which has a bunch of other features like a missing-data model designed for non-ignorable missingness. But I think the brms code will allow you to estimate the basic 2-Pl model–assuming you can fix a particular legislator/item, which seems like it should be possible.