Unexpected uncertainty in posterior predicitive checks with Ordinal models

arthur-albuquerque · February 24, 2023, 1:18pm

Hi, I am fitting a Proportional Odds model in brms (family = cumulative("logit")) with random-intercepts.
@joels found greater uncertainties around predictions with posterior_predict(..., re_formula = NA) than with re_formula = NULL, which we wouldn’t expect.

As per brms documention:

re_formula formula containing group-level effects to be considered in the prediction. If NULL (default), include all group-level effects; if NA, include no group-level effects.

What is the explanation for this phenomenon?

pacman::p_load(brms,
               patchwork)


# Data
a <- c(rep(0,0), rep(1,3), rep(2,5), rep(3,5), rep(4,25), rep(5,40), rep(6,24))
b <- c(rep(0,2), rep(1,3), rep(2,9), rep(3,17), rep(4,33), rep(5,18), rep(6,18))
x <- c(rep('medical', length(a)), rep('endovascular', length(b)))
y <- c(a, b)
d1 = data.frame(x, y, study = "RESCUE-Japan LIMIT")


a <- c(rep(0,0), rep(1,3), rep(2,9), rep(3,20), rep(4,36), rep(5,32), rep(6,71))
b <- c(rep(0,2), rep(1,9), rep(2,25), rep(3,31), rep(4,27), rep(5,15), rep(6,68))
x <- c(rep('medical', length(a)), rep('endovascular', length(b)))
y <- c(a, b)
d2 = data.frame(x, y, study = "SELECT2")

a <- c(rep(0,0), rep(1,8), rep(2,18), rep(3,49), rep(4,60), rep(5,45), rep(6,45))
b <- c(rep(0,9), rep(1,19), rep(2,41), rep(3,39), rep(4,45), rep(5,27), rep(6,50))
x <- c(rep('medical', length(a)), rep('endovascular', length(b)))
y <- c(a, b)
d3 = data.frame(x, y, study = "ANGEL-ASPECT")

d = rbind(d1, d2, d3)

d$y <-factor(d$y, ordered = T)
d$x<-factor(d$x, levels = c("endovascular", "medical"))


PO_brms <- brms::brm(
  family = cumulative("logit"), # PO model
  formula = y ~ x  + (1 | study),
                      
  prior = 
    prior(normal(0, 1.5), class = "b") + 
    prior(student_t(4, 0, 2.5), class = "sd"),
  
  iter = 2000,
  chain = 4,
  cores = parallel::detectCores(),
  control = list(adapt_delta = .99),
  backend = "cmdstanr",
  seed = 123,
  data = d
  )

p1a =  pp_check(PO_brms, group = "x", type = "bars_grouped", ndraws = 1000,
                re_formula=NULL) +
  labs(title = "re_formula=NULL") + coord_cartesian(ylim = c(0, 220))

p1b = pp_check(PO_brms, group = "x", type = "bars_grouped", ndraws = 1000,
               re_formula=NA) +
  labs(title = "re_formula=NA") + coord_cartesian(ylim = c(0, 220))

p1a + p1b + patchwork::plot_layout(guides = 'collect')

tjmahr · February 24, 2023, 2:30pm

When we use re_formula = NULL, we are conditioning the predictions on the group labels, and we know the expected values for the group values well because we have seen their data. So these predictions will have less uncertainty. When we use re_formula = NA, we are 0-ing out the random intercept value and making the prediction for a statistically typical group. We only have 3 groups here. We don’t know what the typical group should look like, so we have more uncertainty.

Now, if we are making predictions for a complete new, as-yet unseen group, we need to make a prediction using a newdata argument with a unique group label and sample_new_levels = TRUE. This will incorporate the uncertainty around the statistically typical group (fixed effects) and the uncertainty for a variability in groups (SD of the random intercepts).

arthur-albuquerque · February 24, 2023, 3:50pm

How interesting, but I’m still a little confused. My model’s formula is:
y ~ x + (1 | study)

In the post above, I grouped PPCs by "x".

Now , I ran pp_check(..., group = "study").

Intervals are still wider with re_formula = NA, but actually I would expect to get an error with pp_check(..., group = "study", re_formula=NA) since:

When we use re_formula = NA , we are 0-ing out the random intercept value and making the prediction for a statistically typical group.

How does brms yield different predictions by study with re_formula = NA?

arthur-albuquerque · February 24, 2023, 6:40pm

The plot above must be a bug, given this:

> nd = d |> modelr::data_grid(x, study)
> tidybayes::predicted_rvars(object = PO_brms,
+                        newdata = nd,
+                        re_formula = NA)
# A tibble: 6 × 3
  x            study              .prediction
  <fct>        <chr>               <rvar[1d]>
1 endovascular ANGEL-ASPECT         4.8 ± 1.6
2 endovascular RESCUE-Japan LIMIT   4.8 ± 1.6
3 endovascular SELECT2              4.8 ± 1.6
4 medical      ANGEL-ASPECT         5.2 ± 1.6
5 medical      RESCUE-Japan LIMIT   5.2 ± 1.6
6 medical      SELECT2              5.2 ± 1.6

Topic		Replies	Views
Issues with hierarchical ordinal model (brms) brms posterior-predictive , brms	1	235	February 10, 2024
Random slopes with posterior_linpred brms	4	549	March 12, 2019
Posterior predictive distribution for ordinal regression brms	2	1755	March 3, 2019
Guidance on prior predictive checks in ordinal model brms ecology , prior-predictive	37	1874	March 31, 2025
Question re brms: conditional_effects General brms	4	1220	July 4, 2022

Unexpected uncertainty in posterior predicitive checks with Ordinal models

Related topics