How to scale your priors by number of predictors in logistic regression

Just to follow-up on what @andrewgelman said, centering changes the interpretation of your main effects because it is a conditional estimate. It’s always helpful to plot the model predictions to test your understanding. I get surprised by this type of things more often than I’d like to admit, but the predictions usually reveal what would have been obvious had I been paying attention.

What you’ll find is that the two models give the exact same predictions (assuming no priors, and in this case practically identical). There is no conditional main effect of sex in the centered model because it just so happens that the male and female prediction lines cross almost exactly at Intervention = 0.5. If you were to change the female success rates to, say, 0.3 and 0.7 (i.e. y = n*c(0.3, 0.70, 0.25, 0.25)), then the centered model would have a significant conditional main effect for sex. And the standard error is smaller in the centered version because you’re comparing the difference at the “center” of Intervention rather than way out on one end. Those estimates on the end will be less stable in most cases.

In the end, centering does impact the results if you don’t also adjust the priors. That is, if you are going to shrink towards zero, then the meaning of zero is relevant. In this case, though, it’s not the reason for different model estimates.

library(dplyr)
library(ggplot2)
library(patchwork)
library(brms)

# Uncentered model
n = 200
dat <- data.frame(Intervention = c(0,1,0,1), SexF = c(0,0,1,1), y = n*c(0.1, 0.50, 0.25, 0.25), n)
Mod <- brm(y|trials(n) ~ Intervention * SexF, prior = prior(cauchy(0, 2.5)), family = binomial,  data = dat)
summary(Mod)

pred_df <- expand.grid(Intervention = seq(from = 0, to = 1, length.out = 101),
                       SexF = c(0, 0.5, 1),
                       n = 200)

uncentered_plot <- fitted(Mod, newdata = pred_df) %>%
  bind_cols(pred_df) %>%
  mutate(sex = ordered(SexF, level = c(0, 0.5, 1), labels = c('Male (0)', '"Average (0.5)"', 'Female (1)'))) %>%
  ggplot(aes(Intervention, color = sex, fill = sex)) +
      geom_vline(xintercept = 0, linetype = 2) +
      #geom_ribbon(aes(ymin = `Q2.5`, ymax = `Q97.5`), alpha = 0.5, color = rgb(0,0,0,0)) +
      geom_line(aes(y = Estimate), linewidth = 1) +
      scale_x_continuous('Intervention', breaks = c(0, 0.5, 1)) +
      scale_fill_brewer(palette = 'Set1') +
      scale_color_brewer(palette = 'Set1') +
      labs(title = 'Predictions from uncentered model')

# Centered model
dat2 <- with(dat, data.frame(Intervention = Intervention-0.5, SexF = SexF-0.5, y, n))
Mod2 <- update(Mod, newdata = dat2)
summary(Mod2)

pred_df2 <- expand.grid(Intervention = seq(from = -0.5, to = 0.5, length.out = 101),
                       SexF = c(-0.5, 0, 0.5),
                       n = 200)

centered_plot <- fitted(Mod2, newdata = pred_df2) %>%
  bind_cols(pred_df2) %>%
  mutate(sex = ordered(SexF, level = c(-0.5, 0, 0.5), labels = c('Male (-0.5)', '"Average (0)"', 'Female (0.5)'))) %>%
  ggplot(aes(Intervention, color = sex, fill = sex)) +
    geom_vline(xintercept = 0, linetype = 2) +
    #geom_ribbon(aes(ymin = `Q2.5`, ymax = `Q97.5`), alpha = 0.5, color = rgb(0,0,0,0)) +
    geom_line(aes(y = Estimate), linewidth = 1) +
    scale_x_continuous('Intervention - 0.5', breaks = c(-0.5, 0, 0.5)) +
    scale_fill_brewer(palette = 'Set1') +
    scale_color_brewer(palette = 'Set1') +
    labs(title = 'Predictions from centered model')

uncentered_plot + centered_plot

2 Likes

Very good points. The big takeaway seems to be that my confusion with the centered-data model was largely self-inflicted, i.e. attributable to the artificial symmetry of the mock data. Such a scenario is unlikely to occur with real datasets, so the putative interpretational opacity is less of an issue than I initially supposed.

Blokeman:

Yes, I’m much more inclined to use more informative priors. Here’s an example: Default informative priors for effect sizes: Where do they come from? | Statistical Modeling, Causal Inference, and Social Science
Also I’ve moved forward in the following sense:

  • I used to think about priors being informative or noninformative (for example, in the first two editions of BDA).
  • Around 2008 I started thinking about weakly informative priors as an intermediate category.
  • More recently I’ve been thinking about five levels of priors: (1) flat, (2) super-vague but proper, (3) weakly informative prior, (4) generic informative prior, (5) specific informative prior. See the beginning of the Prior Choice Wiki here: Prior Choice Recommendations · stan-dev/stan Wiki · GitHub

P.S. The Prior Choice Wiki does not seem to be a wiki anymore! At least, I have no idea how to edit it now. Back when it was a wiki, I could just go in and fix and add things.

2 Likes

I’ll just add that the generalized R2D2 paper (https://arxiv.org/pdf/2111.10718.pdf) covers the prior induced on coefficients via a prior on a specfic form of R^2 in the logistic case. I’ve never used those priors myself, but the reasoning seems sensible to me.

3 Likes