Understanding intercept prior in brms

Hi Everyone,

The documentation of brms “prior” function says something about the intercept that sounds important, but I need help in understanding that.

"the intercept has its own parameter class named "Intercept" and priors can thus be specified via set_prior("<prior>", class = "Intercept")… Note that technically, this prior is set on an intercept that results when internally centering all population-level predictors around zero to improve sampling efficiency. On this centered intercept, specifying a prior is actually much easier and intuitive than on the original intercept, since the former represents the expected response value when all predictors are at their means. To treat the intercept as an ordinary population-level effect and avoid the centering parameterization, use 0 + Intercept on the right-hand side of the model formula.
" (link)

If for example I want to run a basic regression y ~ x1 * x2 + ( x1 * x2 | group ) where x1 and x2 are dummy coded factors that are not centered (0/1). Should I give the intercept a prior that will reflect my belief regarding the grand mean, or should I give the intercept a prior that reflect my belief regarding y values when x1 and x2 are zero? I always thought the latter is the correct way, but reading the above documentation I am now not sure…

Thank you,
Nitzan

Yes, this is an important issue, and I suspect many folks aren’t aware of it. If you use dummy coded predictors, I recommend you use the syntax of

y ~ 0 + Intercept + x1 * x2 + ( x1 * x2 | group )

where the 0 removes the default intercept, and the Intercept part replaces it with a new “intercept.” When you set your priors for your new Intercept, it will be of class = b.

4 Likes

Any idea how this work for non-linear models?

For example, in the formula below, what do we think is going on with the intercept priors for r, t0, etc?

fprior <- bf(ss ~ exp(r) * n  - b*exp(-((n-t0)^2)/exp(rho)) /2, 
             r + t0 + b + rho ~ 1,
             nl = TRUE)

Unfortunately, I don’t use the non-linear syntax frequently enough to have a good answer.

The non-linear syntax is different in not having a special class Intercept term in the predictors for the non-linear parameters. Here, the intercept is always class b.

For dummy coded factors, you can still achieve similar behavior of fitting a nonlinear parameter value for each level with a 0 + fact construction. Compare the two fits below, modified from the vignette on nonlinear models.

ba <- c(2, 0.75)  # parameters for factor level 'a'
bb <- c(3, 0.5)  # parameters for factor level 'b'

x <- rnorm(100)
y <- c(rnorm(100, mean = ba[1] * exp(ba[2] * x)),
       rnorm(100, mean = bb[1] * exp(bb[2] * x)))
f <- rep(letters[1:2], each = 100)
dat1 <- data.frame(x, y, f)

priors <- prior(normal(1, 2), nlpar = "b1") +
  prior(normal(0, 2), nlpar = "b2")

# assumes level 'a' is the intercept
fit1 <- brm(bf(y ~ b1 * exp(b2 * x),
               b1 + b2 ~ 1 + f, nl = TRUE),
            data = dat1, prior = priors)

# estimates intercepts for 'a' and 'b' separately
fit2 <- brm(bf(y ~ b1 * exp(b2 * x),
               b1 + b2 ~ 0 + f, nl = TRUE),
            data = dat1, prior = priors)

# same fitted values
plot(conditional_effects(fit1, "x:f"), points = TRUE)
plot(conditional_effects(fit2, "x:f"), points = TRUE)

The fits are identical within the expected simulation error.

3 Likes

Hello,
I have a question that is related.
I ran a binomial bayesian model with brms :

prior_uniform <- set_prior("beta(1, 1)", class = "b", lb = 0, ub = 1)

model_with_offset <- brm(
  formula = presence | trials(1) ~ age_class + offset(log(tot_number_utterance)) + (1|ID) + (1|group) , 
  data = Data_Call,
  family = binomial(),
  prior = prior_uniform,
  chains = 4,
  iter = 8000,
  warmup = 4000,
  seed = 123,
  control = list(adapt_delta = 0.999),
  sample_prior = 'yes'
)

I want to compare my three age classes, in order to know if the presence is significantly different (The intercept has been assigned to the adult age class) :

hypothesis(PROPCALL_model_with_offset, c("Intercept = age_classjuvenile", "Intercept = age_classinfant, age_classjuvenile= age_classinfant", point_estimate = 'mean')

Thus, I get the following error :
Sampling from the prior of an overall intercept is not possible by default. See the documentation of the ‘sample_prior’ argument in help(‘brm’).

This is what I found in the documentation :

Please also note that prior draws for the overall intercept are not obtained by default for technical reasons.

I tried to find more information, but I couldn’t really figure out if there was a solution.

I would be grateful for any help!

By default, your formula will fit the first factor level (or first alphabetically if a character) of age_class as the intercept. I think you could change the first part of your formula to ... ~ 0 + age_class + ... to make the model fit a coefficient for each level of age_class. I’m not totally positive about the behavior of hypothesis(...), and I can’t easily check without sample data for your problem. But I think you should be able to compare age classes to the reference level.

I think you might be missing open/close " between your hypotheses 2 & 3.

Because you only have 1 trial for a binomial distribution, you can use family = bernoulli() and remove the trials(1) argument from your formula. The Bernoulli is a special case of the binomial and your model will sample faster/more efficiently.

If you’re still having trouble, you’re more likely to get fast help by opening a new topic. The ☑︎ by the topic title means that it’s been solved, so folks are often less likely to notice new comments/questions. You can still link to this or other related topics to help folks get a better sense of your problem and what you’ve already tried.

Thank you very much !
I changed my formula and the family, it is indeed much faster, thanks.
I don’t get the warning message anymore, but this I get Na in the hypothesis() output :

Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
1 (Intercept)-(age_... = 0    -3.66      0.41    -4.54    -2.98         NA        NA    *
2 (Intercept)-(age_... = 0    -3.60      0.41    -4.51    -2.95         NA        NA    *
3 (age_classjuvenil... = 0     0.06      0.23    -0.39     0.51       1.95      0.66     

I guesss that I still can conclude based on the estimate and its credibility interval though

Thanks, I’m indeed new here and I didn’t really know how to do ! I’ll open a new topic if I don’t find the solution

Maybe let’s back up, @Aude. Unless you specify otherwise, family = binomial() uses the logit link, by default. That means your intercept and age_class parameters are on the log-odds scale, not the natural probability metric. All of this means that your beta(1, 1) prior will cause problems. Whereas the \beta coefficients are on the log-odds scale, which ranges from negative to positive infinity, your beta(1, 1) prior is constrained to between 0 and 1. If you intend to fit your model on the log-odds scale (which I recommend), you are better served using Gaussian or Student-t priors for the \beta coefficients. Personally, I like normal(0, 1) as a generic default prior for parameters on the log-odds scale.

Anyway, it looks like this is your first time posting on the Stan forums, @Aude. Welcome!

4 Likes

Oh, that was a big mistake ! Thank you very much for the explaination, it was really helpful. It works quit well know.

1 Like