Why does model run with cmdstanr as_draws() have both b_Intercept and Intercept

I was running some prior predictive checks, and trying to confirm my intuition (and something also noted in Statistical Rethinking) that if you specify a regression model in the form: value ~ 1 + gender, that if ‘man’ is the reference group, the prior for ‘women’ will indicate greater uncertainty than for men (because it is estimated by adding the intercept and the b_genderwomen parameters together, which is naturally more uncertain than just the b_Intercept parameter alone).

However, when doing a prior predictive check on this I was surprised that this was not the case, although running a prior predictive check on a model of the form value ~ 0 + gender suggested that the intercepts in this model would be generally a small amount more certain than those for the 1 + gender model (width is the width of the 95% HDI, indicating the uncertainty):

Screenshot 2023-02-16 at 16.01.40

I decided to follow this up by just pulling out the parameter estimates and adding them together, and found that in cmdstanr, for the 1+ gender version of the model, the posterior draws include not only b_Intercept but also just Intercept:

I haven’t seen this before and found it very strange. If I manually add the different intercepts to the b_genderwomen parameter I get the following:

Screenshot 2023-02-16 at 16.01.25

I don’t really understand what is going on here - the values from posterior_epred match b_Intercept + b_genderwomen but I don’t understand how adding one uncertain thing to another can come out the same as just the one uncertain thing, and I really don’t know why this extra Intercept parameter has come up. Running the model without cmdstan means that the Intercept is no longer in the as_draws output, but the posterior_epred values are the same. Can anyone explain what is going on here? In the priors, there was a negative correlation between the b_Intercept and b_genderwomen so I guess they may in some way be ‘trading off’ against one another in some fashion, but it still doesn’t match my expectations.

My expectation would have been that the value for the intercept (men) would have been more certain than for women in the 1 + gender model, and that in the 0 + gender model, the uncertainty for men and women would have been the same, and similar to the intercept in the 1 + gender model.

All the code below to try this out can be run quickly and is pasted below, and any thoughts/explanations would be greatly appreciated!:

library(brms)
library(tidyverse)
library(tidybayes)

test <-
tibble(men = rnorm(1000),
       women = rnorm(1000, 1)) %>% 
  pivot_longer(cols = everything(),
               names_to = "gender")

priors_1 <- c(set_prior("normal(0 , 2)", class = "Intercept"),
              set_prior("normal(0 , 1)", class = "b"),
              set_prior("normal(1 , 2)", class = "sigma"))

test_form1 <- value ~ 1 + gender

get_prior(test_form1,
          test,
          gaussian)

test_reg1 <- 
  brm(formula = test_form1,
      family = gaussian(),
      data = test,
      control = list(adapt_delta = 0.99, max_treedepth = 15),
      prior = priors_1,
      chains = 4,
      cores = 4,
      iter = 10000,
      warmup = 1000,
      sample_prior = "only",
      backend = 'cmdstanr',
      threads = threading(4),
      seed = 1010,
      stan_model_args=list(stanc_options = list('O1'))
      )

test_form2 <- value ~ 0 + gender

priors_2 <- c(set_prior("normal(0 , 2)", class = "b", coef = "gendermen"),
              set_prior("normal(0 , 2)", class = "b", coef = "genderwomen"),
              set_prior("normal(1 , 2)", class = "sigma"))

test_reg2 <- 
  brm(formula = test_form2,
      family = gaussian(),
      data = test,
      control = list(adapt_delta = 0.99, max_treedepth = 15),
      prior = priors_2,
      chains = 4,
      cores = 4,
      iter = 10000,
      warmup = 1000,
      sample_prior = "only",
      backend = 'cmdstanr',
      threads = threading(4),
      seed = 1010,
      stan_model_args=list(stanc_options = list('O1'))
      )

post_1 <- posterior_epred(test_reg1,
                          newdata = tibble(gender = c("men", "women"))) %>% 
  as_tibble() %>% 
  rename(men = V1,
         women = V2) %>% 
  pivot_longer(cols = everything(),
               names_to = "gender")

post_2 <- posterior_epred(test_reg2,
                          newdata = tibble(gender = c("men", "women"))) %>% 
  as_tibble() %>% 
  rename(men = V1,
         women = V2) %>% 
  pivot_longer(cols = everything(),
               names_to = "gender")

summarised <-
  bind_rows(post_1 %>% mutate(type = "value: 1 + gender"),
            post_2 %>% mutate(type = "value: 0 + gender")) %>% 
  group_by(type, gender) %>%
  summarise(width = abs(hdi(value)[1] - hdi(value)[2]))

test_reg1 %>% 
  as_draws_df()

summarised_2 <-
test_reg1 %>% 
  as_draws_df() %>% 
  as_tibble() %>% 
  mutate(`b_Intercept + b_genderwomen` = b_Intercept + b_genderwomen,
         `Intercept + b_genderwomen` = Intercept + b_genderwomen) %>% 
  select(Intercept,
         b_Intercept,
         b_genderwomen,
         `b_Intercept + b_genderwomen`,
         `Intercept + b_genderwomen`) %>% 
  pivot_longer(cols = everything(),
               names_to = "gender") %>% 
  group_by(gender) %>% 
  summarise(width = abs(hdi(value)[1] - hdi(value)[2]))
1 Like

By default, brms internally centers the design matrix (so that the intercept corresponds to the expected response when all predictors are held at their means). This is the parameter Intercept. brms also re-generates the true intercept, which is the thing that it reports back to you. This is the generated quantity b_Intercept. As a result, the prior pushforward for the intercept of the two groups will tend to be the same when the groups are balanced in the data, but will deviate from this similarity as the groups become unbalanced.

Of note, the prior that you pass for the intercept is applied to the parameter Intercept; you are setting a prior on the intercept when predictors are held at their means. If you would like to instead provide a prior on the true intercept where all predictors are fixed to zero (i.e. to their reference levels), then you can write a formula like resp ~ 0 + Intercept + gender. Now b_Intercept will be a parameter in the model and not a generated quantity, and you can pass a prior on it as you would on any other parameter of class "b".

Finally, note that if the tendency for the prior pushforwards to differ between unbalanced groups is a concern, you can get around this by using the 0 + Intercept syntax, but instead of dummy-coding gender as 0/1, code it instead as -1/1. Then as long as your prior on the associated coefficient is symmetric about zero, the prior pushforwards won’t differ across the levels no matter how unbalanced they might be.

1 Like

Thanks @jsocolar this is very helpful, and running the model as 0 + Intercept + gender leads to the following, which is what I would have expected from just 1 + gender:

Screenshot 2023-02-16 at 16.58.41

I don’t know how much people really expect this kind of behavior - I imagine there are a lot of misspecified priors floating around because setting up the model as 0 + Intercept + whatever_else seems quite counterintuitive.

To check I understand what you are saying, in my gender case above, with 1 + gender, if the gender split were 50-50, then the Intercept would actually not me the intercept for men, but the overall intercept for half women and half men?? Then when I do posterior_epred it uses this to make a prediction for the genders I specify?

In addition I was wondering how this works in cumulative/ordinal models? In that case, it is not possible to use 0 + in the regression formula