# Unexpected uncertainty in posterior predicitive checks with Ordinal models

Hi, I am fitting a Proportional Odds model in brms (`family = cumulative("logit")`) with random-intercepts.
@joels found greater uncertainties around predictions with `posterior_predict(..., re_formula = NA)` than with `re_formula = NULL`, which we wouldn’t expect.

As per brms documention:

`re_formula` formula containing group-level effects to be considered in the prediction. If `NULL` (default), include all group-level effects; if `NA`, include no group-level effects.

What is the explanation for this phenomenon?

``````pacman::p_load(brms,
patchwork)

# Data
a <- c(rep(0,0), rep(1,3), rep(2,5), rep(3,5), rep(4,25), rep(5,40), rep(6,24))
b <- c(rep(0,2), rep(1,3), rep(2,9), rep(3,17), rep(4,33), rep(5,18), rep(6,18))
x <- c(rep('medical', length(a)), rep('endovascular', length(b)))
y <- c(a, b)
d1 = data.frame(x, y, study = "RESCUE-Japan LIMIT")

a <- c(rep(0,0), rep(1,3), rep(2,9), rep(3,20), rep(4,36), rep(5,32), rep(6,71))
b <- c(rep(0,2), rep(1,9), rep(2,25), rep(3,31), rep(4,27), rep(5,15), rep(6,68))
x <- c(rep('medical', length(a)), rep('endovascular', length(b)))
y <- c(a, b)
d2 = data.frame(x, y, study = "SELECT2")

a <- c(rep(0,0), rep(1,8), rep(2,18), rep(3,49), rep(4,60), rep(5,45), rep(6,45))
b <- c(rep(0,9), rep(1,19), rep(2,41), rep(3,39), rep(4,45), rep(5,27), rep(6,50))
x <- c(rep('medical', length(a)), rep('endovascular', length(b)))
y <- c(a, b)
d3 = data.frame(x, y, study = "ANGEL-ASPECT")

d = rbind(d1, d2, d3)

d\$y <-factor(d\$y, ordered = T)
d\$x<-factor(d\$x, levels = c("endovascular", "medical"))

PO_brms <- brms::brm(
family = cumulative("logit"), # PO model
formula = y ~ x  + (1 | study),

prior =
prior(normal(0, 1.5), class = "b") +
prior(student_t(4, 0, 2.5), class = "sd"),

iter = 2000,
chain = 4,
cores = parallel::detectCores(),
backend = "cmdstanr",
seed = 123,
data = d
)

p1a =  pp_check(PO_brms, group = "x", type = "bars_grouped", ndraws = 1000,
re_formula=NULL) +
labs(title = "re_formula=NULL") + coord_cartesian(ylim = c(0, 220))

p1b = pp_check(PO_brms, group = "x", type = "bars_grouped", ndraws = 1000,
re_formula=NA) +
labs(title = "re_formula=NA") + coord_cartesian(ylim = c(0, 220))

p1a + p1b + patchwork::plot_layout(guides = 'collect')

``````

When we use `re_formula = NULL`, we are conditioning the predictions on the group labels, and we know the expected values for the group values well because we have seen their data. So these predictions will have less uncertainty. When we use `re_formula = NA`, we are 0-ing out the random intercept value and making the prediction for a statistically typical group. We only have 3 groups here. We don’t know what the typical group should look like, so we have more uncertainty.

Now, if we are making predictions for a complete new, as-yet unseen group, we need to make a prediction using a `newdata` argument with a unique group label and `sample_new_levels = TRUE`. This will incorporate the uncertainty around the statistically typical group (fixed effects) and the uncertainty for a variability in groups (SD of the random intercepts).

1 Like

How interesting, but I’m still a little confused. My model’s formula is:
` y ~ x + (1 | study)`

In the post above, I grouped PPCs by `"x"`.

Now , I ran `pp_check(..., group = "study")`.

Intervals are still wider with `re_formula = NA`, but actually I would expect to get an error with `pp_check(..., group = "study", re_formula=NA)` since:

When we use `re_formula = NA` , we are 0-ing out the random intercept value and making the prediction for a statistically typical group.

How does `brms` yield different predictions by `study` with `re_formula = NA`?

The plot above must be a bug, given this:

``````> nd = d |> modelr::data_grid(x, study)
> tidybayes::predicted_rvars(object = PO_brms,
+                        newdata = nd,
+                        re_formula = NA)
# A tibble: 6 × 3
x            study              .prediction
<fct>        <chr>               <rvar[1d]>
1 endovascular ANGEL-ASPECT         4.8 ± 1.6
2 endovascular RESCUE-Japan LIMIT   4.8 ± 1.6
3 endovascular SELECT2              4.8 ± 1.6
4 medical      ANGEL-ASPECT         5.2 ± 1.6
5 medical      RESCUE-Japan LIMIT   5.2 ± 1.6
6 medical      SELECT2              5.2 ± 1.6
``````