Conditional multilevel IRT and the level-2 correlation structure

  • Operating System: MacOS
  • brms Version: 2.16.2

I’m fitting a multilevel IRT model to experimental data. The design is that all 223 participants responded to the same 5 items in the context of 3 experimental conditions (control, healthy, distressed). All five items were Likert-type questions with 6 response options. I’ve been fitting the Likert-type ratings (rating) as depending on condition. The values are nested in both item and id (participant identifier). My preferred likelihood setting is family = cumulative(probit).

Thus, my initial formula syntax was:

rating | thres(gr = factor(item)) ~ 1 + condition + (1 + condition | item) + (1 + condition | id)

However, I wanted to make the level-2 correlation structure more meaningful, so I also reparametrized the model formula like so:

rating | thres(gr = factor(item)) ~ 1 + condition + (0 + condition | item) + (0 + condition | id)

My initial assumption was that this would change the level-2 sd/correlation parameters a bit and maybe help/hurt the effective sample size estimates. But I presumed this would not change the level-1 thresholds or \beta parameters. I was wrong.

Leaving out technical settings like cores and so on, here’s the code I used to fit the models:

# 26.01536 mins
avoid_3_all <-
  brm(data = d_avoidance_items,
      family = cumulative(probit),
      rating | thres(gr = factor(item)) ~ 1 + condition + (1 + condition | item) + (1 + condition | id),
      prior = c(prior(normal(0, 2), class = Intercept),
                prior(normal(0, 1.5), class = b),
                prior(exponential(1.0 / 1.5), class = sd),
                prior(lkj(1), class = cor)))

# 15.6703 mins
avoid_3b_all <-
  brm(data = d_avoidance_items,
      family = cumulative(probit),
      rating | thres(gr = factor(item)) ~ 1 + condition + (0 + condition | item) + (0 + condition | id),
      prior = c(prior(normal(0, 2), class = Intercept),
                prior(normal(0, 1.5), class = b),
                prior(exponential(1.0 / 1.5), class = sd),
                prior(lkj(1), class = cor)))

The thing I don’t understand is that the threshold posteriors for the first model are about twice as wide as those for the second model. Here’s what that looks like in a coefficient plot:

bind_rows(posterior_summary(avoid_3_all)[1:39, ] %>% data.frame() %>% rownames_to_column("parameter"),
          posterior_summary(avoid_3b_all)[1:39, ] %>% data.frame() %>% rownames_to_column("parameter")) %>% 
  mutate(fit = rep(c("avoid_3_all", "avoid_3b_all"), each = n() / 2)) %>% 
  ggplot(aes(x = parameter, y = Estimate, ymin = Q2.5, ymax = Q97.5, color = fit)) + 
  geom_rect(xmin = 2.5, xmax = 27.5, ymin = -Inf, ymax = Inf,
            fill = "grey95", size = 0) +
  geom_hline(yintercept = 0, size = 1/4, color = "grey67") +
  geom_pointrange(size = 1/3, fatten = 3/4, position = position_dodge(width = -0.6)) +
  scale_color_manual(values = c("black", "red3")) +
  xlab(NULL) +
  coord_flip() +
  theme_bw() +
  theme(axis.text.y = element_text(hjust = 0),
        panel.grid = element_blank())

Though it’s no surprise the level-2 sd/correlations have changed (upper white section), I’m shocked by how much more precise the thresholds are for the second model (in red in the middle gray section).

Also, even though they’re both of the same data and the same predictors and number of parameters, their LOO/WAIC values are a little different.

avoid_3_all <- add_criterion(avoid_3_all, criterion = c("loo", "waic"))
avoid_3b_all <- add_criterion(avoid_3b_all, criterion = c("loo", "waic"))

loo_compare(avoid_3_all, avoid_3b_all) %>% print(simplify = F)
             elpd_diff se_diff elpd_loo se_elpd_loo p_loo   se_p_loo looic   se_looic
avoid_3_all      0.0       0.0 -2663.0     58.6       473.9    18.9   5325.9   117.3 
avoid_3b_all    -0.6       1.6 -2663.6     58.7       475.6    19.1   5327.2   117.4 

Can anyone help me understand why my simple reparameterization of the level-2 sd/correlation structure would have such a profound change to the other parts of the model?

If it’s of any help, you can download the two fit objects at GitHub - ASKurz/Conditional-multilevel-IRT-and-the-level-2-correlation-structure

Tagging @matti for good measure.

I could be way off as I don’t work much with ordinal models, but wouldn’t this be because your first model contains varying intercepts for item and the second model does not? The thresholds relate to the intercepts, correct? So in the second model they are more precise because there is no varying intercepts.

You know what? After reading your comment and investigating with other data, I’m now realizing this is a more general issue than multilevel IRT. As you indicate in your comment, what I initially thought of as a simple reparameterization isn’t so simple after all. I haven’t thought this through. Just when you think you understand things, you find out that you do not, indeed, understand things.

It is an interesting example, and I think I agree that the threshold uncertainty trades off with the intercepts here. I also am not sure that the IRT people would call this a multilevel IRT model. I think it might instead be called an IRT model with random item effects.

For it to be called a multilevel IRT model, I think you would need participants to be nested in some other grouping variable like school or country. Maybe condition serves that purpose here, but condition seems to be treated as a fixed covariate instead (does not appear on the right side of a |). But I might misunderstand the model syntax…

When I call it “multilevel IRT,” I’m simply referring to the GLMM IRT framework Paul used in his paper: https://arxiv.org/pdf/1905.09501.pdf. I get that folks with specialties in psychometrics use the term multilevel differently. A similar example is how confusing the term “hierarchical” can be in the SEM context.

1 Like