Question about sigma posterior not being affected much by sigma prior

Hello everyone,

I probably have another silly question for you. I am trying to teach myself Bayesian statistics and I have run into something that I do not quite understand.

I have simulated some data (rnorm(352, mean = 155, sd = 10)) to which I apply two simple intercept-only models. In the first model, I use the default brms priors and in the second model I use a stricter prior for sigma. Yet, in the output, the estimates for sigma seem to be unaffected by the stricter prior on sigma, as seen in this graph:

question_priors_posteriors.pdf (10.4 KB)

Why is this the case? Could it be due to the prior on the intercept?

Here is the exact code I used:

library(brms)
library(tidyverse)

# Simulate data
set.seed(4)
data <- tibble(x = rnorm(352, mean = 155, sd = 10))

# Run model 1, the default brms model
model_default <- brm(
  x ~ 1,
  data = data,
  family = gaussian,
  prior = c(
    prior(student_t(3, 154.6, 10), class = "Intercept"),
    prior(student_t(3, 0, 10), class = "sigma")
  ),
  cores = 4,
  seed = 4, 
  sample_prior = TRUE,
)

# Run model 2 with a smaller sigma in the student_t prior for sigma
model_stricter_sigma <- brm(
  x ~ 1,
  data = data,
  family = gaussian,
  prior = c(
    prior(student_t(3, 154.6, 10), class = "Intercept"),
    prior(student_t(3, 0, 1), class = "sigma")
  ),
  cores = 4,
  seed = 4, 
  sample_prior = TRUE,
)

# Look at model output
model_default
model_stricter_sigma

# Visualize priors and posteriors
results_default <- model_default %>%
  posterior_samples() %>%
  select(ends_with("sigma")) %>%
  pivot_longer(cols = everything()) %>%
  mutate(
      model = "default",
      name = if_else(str_detect(name, "prior"), "prior", "posterior")
    )

results_strict <- model_stricter_sigma %>%
  posterior_samples() %>%
  select(ends_with("sigma")) %>%
  pivot_longer(cols = everything()) %>%
  mutate(
      model = "strict",
      name = if_else(str_detect(name, "prior"), "prior", "posterior")
    )

results <- bind_rows(results_default, results_strict)

ggplot(results, aes(x = value, fill = name)) +
  geom_histogram(binwidth = 0.1) +
  facet_wrap(~ model, ncol = 1) +
  coord_cartesian(xlim = c(0, 20))
  • Operating System: macOS Big Sur (latest)
  • brms Version: 2.14.4
1 Like

352 observations is quite a lot for such a simple model – try with 2, 5, or 10 observations and you should be able to see the impact of the prior.

BDA talk this phenomena, in a slightly different setting, in Section 2.5. In a handwavy kinda way, the posterior is a weighted average of the prior and the likelihood, where the weights depend on the sample size and variability.

Hm… I’m sure that is correct, but it still seems weird to me.

With the strict prior, there is very little density around 10, yet there is where the posterior settles. I’m surprised there is so little difference between the two different priors in terms of the posterior. Even when I use a sample size of 50, the differences between the two posteriors are minimal and barely noticeable with the human eye.

I also don’t find BDA particularly helpful; there’s way too much formula-speak and not enough intuition. I’m a much bigger fan of Statistical Rethinking (which unfortunately introduces this simple model with an unusual uniform sigma prior).

So, summarizing your comment, the explanation is that the data simply overwhelms the prior and that’s why they are so similar?

Yes. It’s hard to say more without getting mathematical, but one other important point is that the prior is a Student-t density whilst your likelihood is a Gaussian.

Specifically, the Student-t has much heavier tails than the Gaussian, so whilst it might look like there is ‘very little density at \sigma \approx 10’, there is (in an equally hand-wavy kind of way):

> dt(10, df = 3) / dnorm(10)
[1] 4.0523e+18

18 orders of magnitude more density with the Student-t than the Gaussian. Try using Gaussian (half-normal) priors for sigma and see what happens.

4 Likes

Ah that’s interesting. I was aware that the student-t has heavier tails than the Gaussian, but I didn’t realize it was this much.

Running the code with gaussian priors indeed has the effect I initially expected.

This helps a lot, thanks!

2 Likes