I have a toy example below, in which I might be trying to update an estimate for some proportion in the population. Imagine there is some new data collected, but we have also have some strong prior information that the proportion in the population for this quantity should be close to 10% (in the data, it is much higher). I’m trying to update the estimate on the basis of the new information, but I set in this case a very strong informative prior.
If I ignore age in this case, then the use of the strongly informative prior brings the population estimate for the proportion out at around 16% (in the simulated sample I ran, it was 22%). So this estimate is some compromise between the very strong prior centered at 10% and the data.
However, when I include a random effect for age, although the intercept estimated in the model remains low, it is like the age random effect is used to compensate and get the estimated percentage a lot higher. All of the ranef estimates are positive, and the population estimated average comes out at 20%.
I had thought the random effects for each age level should average out at 0, and that a prior on the intercept could still be used to edge the final estimate up or down quite a bit.
Can anyone explain what is the deal here, and if I did want to use such prior information so as to shift an estimate down on the basis of prior data, how I could do that?
library(tidyverse)
library(brms)
mock_data <-
tibble(age = sample(c("18-29",
"30-49",
"50-64",
"65+"),
size = 1000,
replace = TRUE)) %>%
rowwise() %>%
mutate(outcome = case_when(age == "18-29" ~ sample(c(0, 1), 1, prob = c(.9, .1)),
age == "30-49" ~ sample(c(0, 1), 1, prob = c(.85, .175)),
age == "50-64" ~ sample(c(0, 1), 1, prob = c(.8, .25)),
age == "65+" ~ sample(c(0, 1), 1, prob = c(.75, .325))))
mean(mock_data$outcome)
# aim for 10%
qlogis(.1)
# roughly -2.2
prior_without_ranef <-
c(set_prior("normal(-2.2, .1)", class = "Intercept")
)
prior_with_ranef <-
c(set_prior("normal(-2.2, .1)", class = "Intercept"),
set_prior("normal(0, .1)", class = "sd")
)
my_formula_without_ranef <-
outcome ~ 1
my_formula_with_ranef <-
outcome ~ 1 + (1 | age)
fit_without_ranef <-
brm(formula = my_formula_without_ranef,
family = bernoulli(),
prior = prior_without_ranef,
data = mock_data,
chains = 4,
cores = 4,
iter = 1500,
warmup = 500,
backend = 'cmdstanr')
fit_with_ranef <-
brm(formula = my_formula_with_ranef,
family = bernoulli(),
prior = prior_with_ranef,
data = mock_data,
chains = 4,
cores = 4,
iter = 1500,
warmup = 500,
backend = 'cmdstanr')
ranef(fit_with_ranef)
posterior_without_ranef <-
add_epred_draws(object = fit_without_ranef,
newdata = tibble(id = 1),
ndraws = 1000)
mean(posterior_without_ranef$.epred)
posterior_with_ranef <-
add_epred_draws(object = fit_with_ranef,
newdata = tibble(age = rep(c("18-29",
"30-49",
"50-64",
"65+"),
each = 250)),
ndraws = 1000)
posterior_with_ranef %>%
group_by(age) %>%
summarise(mean = mean(.epred))
mean(posterior_with_ranef$.epred)