Hello everyone,
I’ve run into a weird issue while designing teaching materials. Prior predictive simulation in brms is not demonstrating the behaviour that I expected. I am looking to better understand what brms is doing or to clear up a conceptual misunderstanding on my end.
The example I’m creating concerns coding categorical variables with dummy coding and how that can create problems when setting priors. To keep things concrete, I will walk through a reproducible example. Below I create a variable ‘x’ that has two levels: 0 and 1. We then create y
as a continuous variable from a normal distribution where the mean is conditional on the level of x.
# Simulating data
x <- rep(c(0, 1), times = 50)
y <- rnorm(100, 10+(x*2), 1)
# Creating dataframe and changing x to factor
df <- data.frame(x, y)
df$x <- as.factor(df$x)
As the level ‘0’ will be represented by the Intercept, and the estimated effect of ‘1’ will be the difference from the Intercept, I would always expect any set of priors to have greater uncertainty regarding the average of y when x=1, because by definition it has the uncertainty of the prior on the Intercept and the uncertainty of the difference between them. I checked that was the case with a simple simulation below:
# Running a simple prior simulation
x0 <- rnorm(1e6, mean = 8, sd = 3)
x1 <- rnorm(1e6, mean = 8, sd = 3) + rnorm(1e6, mean = 0, sd = 2)
# Means are equal as expected
mean(x0) # 7.998458
mean(x1) # 7.995305
# SD of x1 is more uncertain, as expected
sd(x0) # 2.999817
sd(x1) # 3.605881
However, when I run a prior check using the brm()
function, I don’t see that the uncertainty regarding the averages is different. I run the prior check using the code below.
prior_check <- brm(y ~ x,
data = df,
prior = c(
prior(normal(8,3), class = "Intercept"),
prior(normal(0,2), class = "b"),
prior(normal(0,3), class = "sigma")),
sample_prior = "only")
First I visualized the means using the following code, which produced the plot below it.
The uncertainty regarding the means looks about the same, which I didn’t understand, so I looked at the draws myself, which led to the same result:
# Getting a dataframe of draws
prior_df <- as_draws_df(prior_check)
# Getting the means - similar as expected
mean(prior_df$b_Intercept) # 7.97645
mean(prior_df$b_Intercept+prior_df$b_x1) # 8.072909
# Getting the sds, similar contrary to expectations
sd(prior_df$b_Intercept) # 3.175866
sd(prior_df$b_Intercept+prior_df$b_x1) # 3.11842
All of the materials I’ve read, as well as many posts on this website and others, refer to this problem of setting priors on dummy variables that I don’t seem to be encountering in brms. Any insight into why this is happening in brms and/or what I do not understand about setting priors would be awesome!
