Confusion about random intercept in binary logistic regression in brms

I have a toy example below, in which I might be trying to update an estimate for some proportion in the population. Imagine there is some new data collected, but we have also have some strong prior information that the proportion in the population for this quantity should be close to 10% (in the data, it is much higher). I’m trying to update the estimate on the basis of the new information, but I set in this case a very strong informative prior.

If I ignore age in this case, then the use of the strongly informative prior brings the population estimate for the proportion out at around 16% (in the simulated sample I ran, it was 22%). So this estimate is some compromise between the very strong prior centered at 10% and the data.

However, when I include a random effect for age, although the intercept estimated in the model remains low, it is like the age random effect is used to compensate and get the estimated percentage a lot higher. All of the ranef estimates are positive, and the population estimated average comes out at 20%.

I had thought the random effects for each age level should average out at 0, and that a prior on the intercept could still be used to edge the final estimate up or down quite a bit.

Can anyone explain what is the deal here, and if I did want to use such prior information so as to shift an estimate down on the basis of prior data, how I could do that?

library(tidyverse)
library(brms)

mock_data <-
  tibble(age = sample(c("18-29",
                        "30-49",
                        "50-64",
                        "65+"),
                  size = 1000,
                  replace = TRUE)) %>% 
  rowwise() %>% 
  mutate(outcome = case_when(age == "18-29" ~ sample(c(0, 1), 1, prob = c(.9, .1)),
                             age == "30-49" ~ sample(c(0, 1), 1, prob = c(.85, .175)),
                             age == "50-64" ~ sample(c(0, 1), 1, prob = c(.8, .25)),
                             age == "65+" ~ sample(c(0, 1), 1, prob = c(.75, .325))))

mean(mock_data$outcome)

# aim for 10%
qlogis(.1)
# roughly -2.2

prior_without_ranef <-
  c(set_prior("normal(-2.2, .1)", class = "Intercept")
  )

prior_with_ranef <-
  c(set_prior("normal(-2.2, .1)", class = "Intercept"),
    set_prior("normal(0, .1)", class = "sd")
  )

my_formula_without_ranef <-
  outcome ~ 1

my_formula_with_ranef <-
  outcome ~ 1 + (1 | age)

fit_without_ranef <-
  brm(formula = my_formula_without_ranef,
      family = bernoulli(),
      prior = prior_without_ranef,
      data = mock_data,
      chains = 4,
      cores = 4,
      iter = 1500,
      warmup = 500,
      backend = 'cmdstanr')

fit_with_ranef <-
  brm(formula = my_formula_with_ranef,
      family = bernoulli(),
      prior = prior_with_ranef,
      data = mock_data,
      chains = 4,
      cores = 4,
      iter = 1500,
      warmup = 500,
      backend = 'cmdstanr')

ranef(fit_with_ranef)

posterior_without_ranef <-
  add_epred_draws(object = fit_without_ranef,
                  newdata = tibble(id = 1),
                  ndraws = 1000)

mean(posterior_without_ranef$.epred)

posterior_with_ranef <-
  add_epred_draws(object = fit_with_ranef,
                  newdata = tibble(age = rep(c("18-29",
                                               "30-49",
                                               "50-64",
                                               "65+"),
                                             each = 250)),
                  ndraws = 1000)

posterior_with_ranef %>% 
  group_by(age) %>% 
  summarise(mean = mean(.epred))

mean(posterior_with_ranef$.epred)

You have very strong prior-data conflict, and the model is contorting into a configuration that best satisfies both. I think the immediate conclusion that I would draw is that the proportion isn’t stationary, i.e. that the data that inform your prior aren’t actually drawn from the same population as the data that inform your likelihood.

If you really want this, you could specify the random effect vector in raw Stan as a zero_sum_vector. However, even in this case the model would likely contort to maintain the zero-sum random effect by estimating big negative values for effect levels with less new data, and big positive values for effect levels with lots of new data.

The more principled thing to do in your position might be to come up with some understanding of why your new data looks so different from the previous data, and how you can appropriately account for that difference in your modeling framework.

Thanks for the quick response jscolar

the data that inform your prior aren’t actually drawn from the same population as the data that inform your likelihood

Yeh, I think that is very likely correct, and rather than trying to use a prior to shift the estimate somehow I will need to just take the estimate as is but highlight the reasons why it is likely an overestimate (there are reasons to think that the number would be higher in the sample I’ve got due to some aspects of the sample being different).