IRT: Checking non-uniform DIF with brms

Dear Paul,

I have checked your publications and code with regard to IRT models. Checking uniform DIF is clear to me and pretty straight forward. I now want to also check for non-uniform DIF. We compare the same questions (binary outcome) for two different countries. As you describe in your paper “Bayesian Item Response Modeling in R with brms and Stan” I now want to use a “maximal multilevel approach” to check for potential non-uniform DIF. Country is thus a person covariate varying over items. I came up with two approaches that yielded so far similar results and wonder which is the better choice (or which one is wrong):
First approach:

formula_4pl_dif <- bf(
  response ~ gamma + (1 - gamma - psi) * 
    inv_logit(beta + exp(logalpha) * theta),
  nl = TRUE,
  theta ~ 0 + (1 | person),
  beta ~ 1 + country + (1 + country |i| item),
  logalpha ~ 1 + country + (1 + country |i| item),
  logitgamma ~ 1 + country + (1 + country |i| item),
  nlf(gamma ~ inv_logit(logitgamma)),
  logitpsi ~ 1 + country + (1 + country |i| item),
  nlf(psi ~ inv_logit(logitpsi)),
  family = brmsfamily("bernoulli", link = "identity")
)

I then test whether the estimates are different from 0

hypothesis(fit_4pl, "beta_countrya < 0", scope = "coef", group = "item", alpha = 0.05)
hypothesis(fit_4pl, "beta_countrya > 0", scope = "coef", group = "item", alpha = 0.05)

For the second approach I followed what you describe in the paper with gender - basically using (0 + country |i| item) which gives each country its own varying effect.
Second approach:

formula_4pl_dif <- bf(
  response ~ gamma + (1 - gamma - psi) * 
    inv_logit(beta + exp(logalpha) * theta),
  nl = TRUE,
  theta ~ 0 + (1 | person),
  beta ~ 1 + country + (0 + country |i| item),
  logalpha ~ 1 + country + (0 + country |i| item),
  logitgamma ~ 1 + country + (0 + country |i| item),
  nlf(gamma ~ inv_logit(logitgamma)),
  logitpsi ~ 1 + country + (0 + country |i| item),
  nlf(psi ~ inv_logit(logitpsi)),
  family = brmsfamily("bernoulli", link = "identity")
)

Then I would check for differences:

hypothesis(fit_4pl, "beta_countrya = beta_countryb", scope = "coef", group = "item", alpha = 0.05)

What would you recommend?
Thanks for your help!

  • Operating System: Windows
  • brms Version: 2.12.0

The two approaches should provide the same results if, in the first case, you don’t use two hypothesis with > and < but only one with = .

1 Like

I’ve been looking for some documentation on DIF with this parameterization (i.e., where theta and item difficulty are estimated “separately” versus combined – eta ~ 1 + (1 | item) + (1 | person)), so I’m glad to have found this old post.

I do have a conceptual question about the implication of estimating theta “separately” from difficulty in the non-linear equations. Specifically, my concern is that of impact. Say that the two groups (in OP’s case, countries) have different latent trait means +/- variances. Not modeling this separation would potentially result in all items showing DIF but this is due to impact (latent trait mean differences) rather than genuine DIF.

When difficulty and theta are estimated on the same line, including the grouping variable should obviate the concern for impact (which is noted in the Bayesian IRT in brms preprint referenced by OP). My ultimate question here is when there is a separate theta estimation in the non-linear specification, would we not need to do something like this (for a simpler 2PL model):

data$countryA <- (data$country == "A")+0
data$countryB <- (data$country == "B")+0

Impact.form <- bf(beta + exp(logalpha) * theta,
                  theta ~ 0 + (-1 + countryA | person) + (-1 + countryB | person),
                  beta ~ 1 + country + (1 + country |i| item),
                  logalpha ~ 1 + country + (1 + country |i| item),
                  nl = TRUE)

In this case, the theta estimate is specifically modeling two independent latent traits for the countries involved, allowing for possible mean +/- variance differences. Is it necessary to be this explicit about such group-level differences, or does including the grouping variable on the item parameters implicitly allow for latent trait differences between groups?