Default prior for the intercept in multilevel models is based on the data?

  • Operating System: Ubuntu 18.04
  • brms Version: 2.10.0

The default prior that brms generates for the intercept seems to use the mean of the outcome variable for the mean of the prior distribution. I always thought that informing the prior by the data is “double-dipping”, i.e., using the data twice, and leads to overconfident inference.

Here some R code to replicate the priors:

library(brms)

get_intercept_prior <- function(stancode) {
  code_by_line <- strsplit(stancode, "\n")[[1]]
  code_by_line[grepl("student_t_lpdf\\(Intercept ", code_by_line)]
}

# a model
formula <- y ~ 1 + (1 | task) + (1 || school / student)

# some data for brms::make_stancode
nTasks    <- 3
nSchools  <- 10
nStudents <- 200

idxTasks   <- sample(nTasks, nStudents, replace = TRUE)
idxSchools <- sample(nSchools, nStudents, replace = TRUE)
idxStudent <- seq_len(nStudents)

df <- data.frame(
  task    = idxTasks,
  school  = idxSchools,
  student = idxStudent
)

# generate stan code for multiple values of y

df$y <- 0
stancode <- make_stancode(formula, df) # mean of 0
get_intercept_prior(stancode)
# target += student_t_lpdf(Intercept | 3, 0, 10);

df$y <- 5
stancode <- make_stancode(formula, df) # mean of 5
get_intercept_prior(stancode)
# target += student_t_lpdf(Intercept | 3, 5, 10);

df$y <- 10
stancode <- make_stancode(formula, df) # mean of 10
get_intercept_prior(stancode)
# target += student_t_lpdf(Intercept | 3, 10, 10);

Could someone perhaps explain why this is a good idea? Does it improve the sampling in Stan? Can anybody recommend a paper that advocates for/ explains this choice of prior?

1 Like

You are correct that technically this is problematic. Whenever possible, you should specify the prior on intercept manually using your domain knowledge. However, I think it is a defensible default: having only improper prior means no regularization and a risk of problematic convergence. Having a fixed default prior risks skewing the results if the scale of the data is very different from what the default expects. OTOH the student_t distribution used is almost guaranteed to contain the true value, regularize the results slightly while only negligibly influencing the posterior.

In almost any real scenario the biases of your measurement and the the fact that your model does not match the reality perfectly will have orders-of-magnitude larger influence on your inference than this double dipping.

And it turns out the default prior for the intercept does not use the mean of the outcome variable, but probably median as can be verified by adding this to your source code:

df$y <- sample(c(2,15), nrow(df), replace = TRUE)
median(df$y)
mean(df$y)
stancode <- make_stancode(formula, df)
get_intercept_prior(stancode)

However, this is not documented AFAIK, so maybe @paul.buerkner would like to know about this. Actually the brms Overview vignette says:

By default, population level parameters have animproper flat prior over the reals

So maybe this changed recently and the documentation hasn’t caught up?

2 Likes

You are right. The brms overview vignette resembles the first brms paper as published in 2017 and I have since changed the default priors in some ways. I will see how to best update the doc accordingly.

2 Likes