Default prior for the intercept in multilevel models is based on the data?

vandenman · November 18, 2019, 12:10pm

Operating System: Ubuntu 18.04
brms Version: 2.10.0

The default prior that brms generates for the intercept seems to use the mean of the outcome variable for the mean of the prior distribution. I always thought that informing the prior by the data is “double-dipping”, i.e., using the data twice, and leads to overconfident inference.

Here some R code to replicate the priors:

library(brms)

get_intercept_prior <- function(stancode) {
  code_by_line <- strsplit(stancode, "\n")[[1]]
  code_by_line[grepl("student_t_lpdf\\(Intercept ", code_by_line)]
}

# a model
formula <- y ~ 1 + (1 | task) + (1 || school / student)

# some data for brms::make_stancode
nTasks    <- 3
nSchools  <- 10
nStudents <- 200

idxTasks   <- sample(nTasks, nStudents, replace = TRUE)
idxSchools <- sample(nSchools, nStudents, replace = TRUE)
idxStudent <- seq_len(nStudents)

df <- data.frame(
  task    = idxTasks,
  school  = idxSchools,
  student = idxStudent
)

# generate stan code for multiple values of y

df$y <- 0
stancode <- make_stancode(formula, df) # mean of 0
get_intercept_prior(stancode)
# target += student_t_lpdf(Intercept | 3, 0, 10);

df$y <- 5
stancode <- make_stancode(formula, df) # mean of 5
get_intercept_prior(stancode)
# target += student_t_lpdf(Intercept | 3, 5, 10);

df$y <- 10
stancode <- make_stancode(formula, df) # mean of 10
get_intercept_prior(stancode)
# target += student_t_lpdf(Intercept | 3, 10, 10);

Could someone perhaps explain why this is a good idea? Does it improve the sampling in Stan? Can anybody recommend a paper that advocates for/ explains this choice of prior?

martinmodrak · December 1, 2019, 1:41pm

You are correct that technically this is problematic. Whenever possible, you should specify the prior on intercept manually using your domain knowledge. However, I think it is a defensible default: having only improper prior means no regularization and a risk of problematic convergence. Having a fixed default prior risks skewing the results if the scale of the data is very different from what the default expects. OTOH the student_t distribution used is almost guaranteed to contain the true value, regularize the results slightly while only negligibly influencing the posterior.

In almost any real scenario the biases of your measurement and the the fact that your model does not match the reality perfectly will have orders-of-magnitude larger influence on your inference than this double dipping.

And it turns out the default prior for the intercept does not use the mean of the outcome variable, but probably median as can be verified by adding this to your source code:

df$y <- sample(c(2,15), nrow(df), replace = TRUE)
median(df$y)
mean(df$y)
stancode <- make_stancode(formula, df)
get_intercept_prior(stancode)

However, this is not documented AFAIK, so maybe @paul.buerkner would like to know about this. Actually the brms Overview vignette says:

By default, population level parameters have animproper flat prior over the reals

So maybe this changed recently and the documentation hasn’t caught up?

paul.buerkner · December 2, 2019, 12:27pm

You are right. The brms overview vignette resembles the first brms paper as published in 2017 and I have since changed the default priors in some ways. I will see how to best update the doc accordingly.

Topic		Replies	Views
How does brms use the given data to adjust the default priors? brms	5	1046	August 11, 2020
Spotting the default priors in stancode returned by brms brms techniques , specification , brms	1	479	July 16, 2022
Prior on intercept when using offset in brms brms specification	3	664	June 9, 2020
Understanding intercept prior in brms brms	11	1935	May 12, 2025
Question about posterior distribution over intercept parameter brms brms	14	1679	May 16, 2021

Default prior for the intercept in multilevel models is based on the data?

Related topics