Understanding intercept prior in brms

nitzan_shahar · February 4, 2024, 6:58am

Hi Everyone,

The documentation of brms “prior” function says something about the intercept that sounds important, but I need help in understanding that.

"the intercept has its own parameter class named "Intercept" and priors can thus be specified via set_prior("<prior>", class = "Intercept")… Note that technically, this prior is set on an intercept that results when internally centering all population-level predictors around zero to improve sampling efficiency. On this centered intercept, specifying a prior is actually much easier and intuitive than on the original intercept, since the former represents the expected response value when all predictors are at their means. To treat the intercept as an ordinary population-level effect and avoid the centering parameterization, use 0 + Intercept on the right-hand side of the model formula.
" (link)

If for example I want to run a basic regression y ~ x1 * x2 + ( x1 * x2 | group ) where x1 and x2 are dummy coded factors that are not centered (0/1). Should I give the intercept a prior that will reflect my belief regarding the grand mean, or should I give the intercept a prior that reflect my belief regarding y values when x1 and x2 are zero? I always thought the latter is the correct way, but reading the above documentation I am now not sure…

Thank you,
Nitzan

Solomon · February 4, 2024, 9:04pm

Yes, this is an important issue, and I suspect many folks aren’t aware of it. If you use dummy coded predictors, I recommend you use the syntax of

y ~ 0 + Intercept + x1 * x2 + ( x1 * x2 | group )

where the 0 removes the default intercept, and the Intercept part replaces it with a new “intercept.” When you set your priors for your new Intercept, it will be of class = b.

Alasdair_Clarke · February 5, 2024, 4:05pm

Any idea how this work for non-linear models?

For example, in the formula below, what do we think is going on with the intercept priors for r, t0, etc?

fprior <- bf(ss ~ exp(r) * n  - b*exp(-((n-t0)^2)/exp(rho)) /2, 
             r + t0 + b + rho ~ 1,
             nl = TRUE)

Solomon · February 5, 2024, 4:37pm

Unfortunately, I don’t use the non-linear syntax frequently enough to have a good answer.

wpetry · February 5, 2024, 5:58pm

The non-linear syntax is different in not having a special class Intercept term in the predictors for the non-linear parameters. Here, the intercept is always class b.

For dummy coded factors, you can still achieve similar behavior of fitting a nonlinear parameter value for each level with a 0 + fact construction. Compare the two fits below, modified from the vignette on nonlinear models.

ba <- c(2, 0.75)  # parameters for factor level 'a'
bb <- c(3, 0.5)  # parameters for factor level 'b'

x <- rnorm(100)
y <- c(rnorm(100, mean = ba[1] * exp(ba[2] * x)),
       rnorm(100, mean = bb[1] * exp(bb[2] * x)))
f <- rep(letters[1:2], each = 100)
dat1 <- data.frame(x, y, f)

priors <- prior(normal(1, 2), nlpar = "b1") +
  prior(normal(0, 2), nlpar = "b2")

# assumes level 'a' is the intercept
fit1 <- brm(bf(y ~ b1 * exp(b2 * x),
               b1 + b2 ~ 1 + f, nl = TRUE),
            data = dat1, prior = priors)

# estimates intercepts for 'a' and 'b' separately
fit2 <- brm(bf(y ~ b1 * exp(b2 * x),
               b1 + b2 ~ 0 + f, nl = TRUE),
            data = dat1, prior = priors)

# same fitted values
plot(conditional_effects(fit1, "x:f"), points = TRUE)
plot(conditional_effects(fit2, "x:f"), points = TRUE)

The fits are identical within the expected simulation error.

Aude · April 18, 2024, 3:52pm

Hello,
I have a question that is related.
I ran a binomial bayesian model with brms :

prior_uniform <- set_prior("beta(1, 1)", class = "b", lb = 0, ub = 1)

model_with_offset <- brm(
  formula = presence | trials(1) ~ age_class + offset(log(tot_number_utterance)) + (1|ID) + (1|group) , 
  data = Data_Call,
  family = binomial(),
  prior = prior_uniform,
  chains = 4,
  iter = 8000,
  warmup = 4000,
  seed = 123,
  control = list(adapt_delta = 0.999),
  sample_prior = 'yes'
)

I want to compare my three age classes, in order to know if the presence is significantly different (The intercept has been assigned to the adult age class) :

hypothesis(PROPCALL_model_with_offset, c("Intercept = age_classjuvenile", "Intercept = age_classinfant, age_classjuvenile= age_classinfant", point_estimate = 'mean')

Thus, I get the following error :
Sampling from the prior of an overall intercept is not possible by default. See the documentation of the ‘sample_prior’ argument in help(‘brm’).

This is what I found in the documentation :

Please also note that prior draws for the overall intercept are not obtained by default for technical reasons.

I tried to find more information, but I couldn’t really figure out if there was a solution.

I would be grateful for any help!

wpetry · April 18, 2024, 4:58pm

By default, your formula will fit the first factor level (or first alphabetically if a character) of age_class as the intercept. I think you could change the first part of your formula to ... ~ 0 + age_class + ... to make the model fit a coefficient for each level of age_class. I’m not totally positive about the behavior of hypothesis(...), and I can’t easily check without sample data for your problem. But I think you should be able to compare age classes to the reference level.

I think you might be missing open/close " between your hypotheses 2 & 3.

Because you only have 1 trial for a binomial distribution, you can use family = bernoulli() and remove the trials(1) argument from your formula. The Bernoulli is a special case of the binomial and your model will sample faster/more efficiently.

If you’re still having trouble, you’re more likely to get fast help by opening a new topic. The ☑︎ by the topic title means that it’s been solved, so folks are often less likely to notice new comments/questions. You can still link to this or other related topics to help folks get a better sense of your problem and what you’ve already tried.

Aude · April 18, 2024, 5:18pm

Thank you very much !
I changed my formula and the family, it is indeed much faster, thanks.
I don’t get the warning message anymore, but this I get Na in the hypothesis() output :

Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
1 (Intercept)-(age_... = 0    -3.66      0.41    -4.54    -2.98         NA        NA    *
2 (Intercept)-(age_... = 0    -3.60      0.41    -4.51    -2.95         NA        NA    *
3 (age_classjuvenil... = 0     0.06      0.23    -0.39     0.51       1.95      0.66

I guesss that I still can conclude based on the estimate and its credibility interval though

Thanks, I’m indeed new here and I didn’t really know how to do ! I’ll open a new topic if I don’t find the solution

Solomon · April 18, 2024, 6:06pm

Maybe let’s back up, @Aude. Unless you specify otherwise, family = binomial() uses the logit link, by default. That means your intercept and age_class parameters are on the log-odds scale, not the natural probability metric. All of this means that your beta(1, 1) prior will cause problems. Whereas the \beta coefficients are on the log-odds scale, which ranges from negative to positive infinity, your beta(1, 1) prior is constrained to between 0 and 1. If you intend to fit your model on the log-odds scale (which I recommend), you are better served using Gaussian or Student-t priors for the \beta coefficients. Personally, I like normal(0, 1) as a generic default prior for parameters on the log-odds scale.

Anyway, it looks like this is your first time posting on the Stan forums, @Aude. Welcome!

Aude · April 23, 2024, 3:04pm

Oh, that was a big mistake ! Thank you very much for the explaination, it was really helpful. It works quit well know.

cour10eygrace · May 12, 2025, 5:08pm

thanks for this great explanation. Would you also recommend this syntax if you had a mix of continuous and dummy coded (factor) predictors in the same model? For example if x1 and x2 were continuous in this case and another term x3 was a factor.

Solomon · May 12, 2025, 5:34pm

By “this syntax,” are you asking about my generic normal(0, 1) prior recommendation? If so, then yes, that’s my go-to prior for these kinds of models. But that’s just a place to start the conversation. For a real scientific analysis, you might want to go beyond generic defaults. McElreath has some nice discussions of how to think about this in his second edition (https://xcelab.net/rm/), and Gelman and friends have a nice discussion of this in their Bayesian Workflow paper ([2011.01808] Bayesian Workflow).

Topic		Replies	Views
Interpretation of priors on nonlinear parameters brms	2	1110	May 21, 2019
Mixed up with the understanding of y~0+ intercept+x and y~x Modeling techniques , specification	1	1481	June 18, 2020
Cannot change prior of population-level Intercept brms	2	1175	August 15, 2018
Non-centered population-level intercept brms	5	1159	September 3, 2020
Lb in brms prior brms	4	1445	October 6, 2018

Understanding intercept prior in brms

Related topics