Unexpected uncertainty in posterior predicitive checks with Ordinal models

Hi, I am fitting a Proportional Odds model in brms (family = cumulative("logit")) with random-intercepts.
@joels found greater uncertainties around predictions with posterior_predict(..., re_formula = NA) than with re_formula = NULL, which we wouldn’t expect.

As per brms documention:

re_formula formula containing group-level effects to be considered in the prediction. If NULL (default), include all group-level effects; if NA, include no group-level effects.

What is the explanation for this phenomenon?

pacman::p_load(brms,
               patchwork)


# Data
a <- c(rep(0,0), rep(1,3), rep(2,5), rep(3,5), rep(4,25), rep(5,40), rep(6,24))
b <- c(rep(0,2), rep(1,3), rep(2,9), rep(3,17), rep(4,33), rep(5,18), rep(6,18))
x <- c(rep('medical', length(a)), rep('endovascular', length(b)))
y <- c(a, b)
d1 = data.frame(x, y, study = "RESCUE-Japan LIMIT")


a <- c(rep(0,0), rep(1,3), rep(2,9), rep(3,20), rep(4,36), rep(5,32), rep(6,71))
b <- c(rep(0,2), rep(1,9), rep(2,25), rep(3,31), rep(4,27), rep(5,15), rep(6,68))
x <- c(rep('medical', length(a)), rep('endovascular', length(b)))
y <- c(a, b)
d2 = data.frame(x, y, study = "SELECT2")

a <- c(rep(0,0), rep(1,8), rep(2,18), rep(3,49), rep(4,60), rep(5,45), rep(6,45))
b <- c(rep(0,9), rep(1,19), rep(2,41), rep(3,39), rep(4,45), rep(5,27), rep(6,50))
x <- c(rep('medical', length(a)), rep('endovascular', length(b)))
y <- c(a, b)
d3 = data.frame(x, y, study = "ANGEL-ASPECT")

d = rbind(d1, d2, d3)

d$y <-factor(d$y, ordered = T)
d$x<-factor(d$x, levels = c("endovascular", "medical"))


PO_brms <- brms::brm(
  family = cumulative("logit"), # PO model
  formula = y ~ x  + (1 | study),
                      
  prior = 
    prior(normal(0, 1.5), class = "b") + 
    prior(student_t(4, 0, 2.5), class = "sd"),
  
  iter = 2000,
  chain = 4,
  cores = parallel::detectCores(),
  control = list(adapt_delta = .99),
  backend = "cmdstanr",
  seed = 123,
  data = d
  )

p1a =  pp_check(PO_brms, group = "x", type = "bars_grouped", ndraws = 1000,
                re_formula=NULL) +
  labs(title = "re_formula=NULL") + coord_cartesian(ylim = c(0, 220))

p1b = pp_check(PO_brms, group = "x", type = "bars_grouped", ndraws = 1000,
               re_formula=NA) +
  labs(title = "re_formula=NA") + coord_cartesian(ylim = c(0, 220))

p1a + p1b + patchwork::plot_layout(guides = 'collect')

When we use re_formula = NULL, we are conditioning the predictions on the group labels, and we know the expected values for the group values well because we have seen their data. So these predictions will have less uncertainty. When we use re_formula = NA, we are 0-ing out the random intercept value and making the prediction for a statistically typical group. We only have 3 groups here. We don’t know what the typical group should look like, so we have more uncertainty.

Now, if we are making predictions for a complete new, as-yet unseen group, we need to make a prediction using a newdata argument with a unique group label and sample_new_levels = TRUE. This will incorporate the uncertainty around the statistically typical group (fixed effects) and the uncertainty for a variability in groups (SD of the random intercepts).

1 Like

How interesting, but I’m still a little confused. My model’s formula is:
y ~ x + (1 | study)

In the post above, I grouped PPCs by "x".

Now , I ran pp_check(..., group = "study").

Intervals are still wider with re_formula = NA, but actually I would expect to get an error with pp_check(..., group = "study", re_formula=NA) since:

When we use re_formula = NA , we are 0-ing out the random intercept value and making the prediction for a statistically typical group.

How does brms yield different predictions by study with re_formula = NA?

The plot above must be a bug, given this:

> nd = d |> modelr::data_grid(x, study)
> tidybayes::predicted_rvars(object = PO_brms,
+                        newdata = nd,
+                        re_formula = NA)
# A tibble: 6 × 3
  x            study              .prediction
  <fct>        <chr>               <rvar[1d]>
1 endovascular ANGEL-ASPECT         4.8 ± 1.6
2 endovascular RESCUE-Japan LIMIT   4.8 ± 1.6
3 endovascular SELECT2              4.8 ± 1.6
4 medical      ANGEL-ASPECT         5.2 ± 1.6
5 medical      RESCUE-Japan LIMIT   5.2 ± 1.6
6 medical      SELECT2              5.2 ± 1.6