Understanding intercept prior in brms

Hi Everyone,

The documentation of brms “prior” function says something about the intercept that sounds important, but I need help in understanding that.

"the intercept has its own parameter class named "Intercept" and priors can thus be specified via set_prior("<prior>", class = "Intercept")… Note that technically, this prior is set on an intercept that results when internally centering all population-level predictors around zero to improve sampling efficiency. On this centered intercept, specifying a prior is actually much easier and intuitive than on the original intercept, since the former represents the expected response value when all predictors are at their means. To treat the intercept as an ordinary population-level effect and avoid the centering parameterization, use 0 + Intercept on the right-hand side of the model formula.
" (link)

If for example I want to run a basic regression y ~ x1 * x2 + ( x1 * x2 | group ) where x1 and x2 are dummy coded factors that are not centered (0/1). Should I give the intercept a prior that will reflect my belief regarding the grand mean, or should I give the intercept a prior that reflect my belief regarding y values when x1 and x2 are zero? I always thought the latter is the correct way, but reading the above documentation I am now not sure…

Thank you,

Yes, this is an important issue, and I suspect many folks aren’t aware of it. If you use dummy coded predictors, I recommend you use the syntax of

y ~ 0 + Intercept + x1 * x2 + ( x1 * x2 | group )

where the 0 removes the default intercept, and the Intercept part replaces it with a new “intercept.” When you set your priors for your new Intercept, it will be of class = b.


Any idea how this work for non-linear models?

For example, in the formula below, what do we think is going on with the intercept priors for r, t0, etc?

fprior <- bf(ss ~ exp(r) * n  - b*exp(-((n-t0)^2)/exp(rho)) /2, 
             r + t0 + b + rho ~ 1,
             nl = TRUE)

Unfortunately, I don’t use the non-linear syntax frequently enough to have a good answer.

The non-linear syntax is different in not having a special class Intercept term in the predictors for the non-linear parameters. Here, the intercept is always class b.

For dummy coded factors, you can still achieve similar behavior of fitting a nonlinear parameter value for each level with a 0 + fact construction. Compare the two fits below, modified from the vignette on nonlinear models.

ba <- c(2, 0.75)  # parameters for factor level 'a'
bb <- c(3, 0.5)  # parameters for factor level 'b'

x <- rnorm(100)
y <- c(rnorm(100, mean = ba[1] * exp(ba[2] * x)),
       rnorm(100, mean = bb[1] * exp(bb[2] * x)))
f <- rep(letters[1:2], each = 100)
dat1 <- data.frame(x, y, f)

priors <- prior(normal(1, 2), nlpar = "b1") +
  prior(normal(0, 2), nlpar = "b2")

# assumes level 'a' is the intercept
fit1 <- brm(bf(y ~ b1 * exp(b2 * x),
               b1 + b2 ~ 1 + f, nl = TRUE),
            data = dat1, prior = priors)

# estimates intercepts for 'a' and 'b' separately
fit2 <- brm(bf(y ~ b1 * exp(b2 * x),
               b1 + b2 ~ 0 + f, nl = TRUE),
            data = dat1, prior = priors)

# same fitted values
plot(conditional_effects(fit1, "x:f"), points = TRUE)
plot(conditional_effects(fit2, "x:f"), points = TRUE)

The fits are identical within the expected simulation error.