Setting prior to a bernoulli brms model

I’m stuck on setting an informative prior to a brms bernoulli model using logit link. Possibly the logit link causes confusion for me.

I generated 5-year data. Each year, males proportion increases by ~10 percentage points.

n = 100
a = tibble(sex = rep(c("m", "m", "m", "m", "m", "m", "m", "f", "f", "f"), length.out = n), year = rep(c(3, 4, 5), length.out = n))
b = tibble(sex = rep(c("m", "m", "m", "m", "m", "m", "f", "f", "f", "f"), length.out = n), year = rep(c(2, 3, 4), length.out = n))
c = tibble(sex = rep(c("m", "m", "m", "m", "m", "f", "f", "f", "f", "f"), length.out = n), year = rep(c(1, 2, 3), length.out = n))
d = tibble(sex = rep(c("m", "m", "m", "m", "f", "f", "f", "f", "f", "f"), length.out = n), year = rep(c(0, 1, 2), length.out = n))
e = rbind(a, b)
f = rbind(e, c)
df = rbind(f, d)
df = df %>% mutate(sex = as.factor(sex))
df %>% ggplot(aes(year, fill = sex)) + geom_bar(position = "fill") + ylab("proportion") + scale_fill_manual(values = c("red", "skyblue"))

I’d like to run a model as follows

m = brm(sex ~ year, family = "bernoulli", data = df, sample_prior = TRUE, prior = prior)

but how to code an informative prior for the model?

I tried the following that did not work. I assumed that each year males proportion would increase by 10 percentage points.

prior = c(set_prior("normal(10, 5)", class = "b", coef = "year"))

This gives highly wide (non-informative) prior for my model.

m %>% posterior_samples() %>% ggplot() + geom_density(aes(b_year), color = "black") + geom_density(aes(prior_b_year), color = "red") + xlab("b_year (black) and prior_b_year (red)")

How should I code an informative prior for such model?

The prior you see in your plot is exactly the a normal distribution centered on ten with sd five that you specified in your model. If you want a more strongly informative prior, you might consider decreasing the standard deviation of your prior.

1 Like

Big thanks! Good point about decreasing the SD. I am thinking of making another error in integrating my beliefs into the model. Seems I should not set my informative prior reflecting reality in percentage scale as the posterior values differ a lot compared to prior. They should match as I set my prior as according to the simulated example.

Updated posterior vs prior, SD equals now 1.

I know that males proportion increase by 10 percentage points each year in my simulated example. I would like to use this “simulated background” knowledge for an informative prior. Should I transform my prior to another scale? Seems the model posterior is in log odds scale? How should I meaningfully define priors in this scale?

My previous prior (not in correct scale)
prior = c(set_prior("normal(10, 1)", class = "b", coef = "year"))

Two things. First, and more trivially, an increase by 10 percentage points is an increase in probability of 0.1, not 10. Second, and more fundamentally, Bernoulli models in brms (and most places) use a logit link by default. So it is impossible to sustain a linear change in probability over any substantial range of a predictor (and this makes sense; otherwise you’d quickly have probabilities outside of the interval [0,1]. If you really think that the probability should be linear in your predictor over the range of values in your data, then you could experiment with using an identity link and some strong priors/bounds to keep the model away from probabilities greater than 1 or less than 0. I’m unsure whether or not this is possible in brms.

1 Like