Question about sigma posterior not being affected much by sigma prior

willemsleegers · December 12, 2020, 12:44pm

Hello everyone,

I probably have another silly question for you. I am trying to teach myself Bayesian statistics and I have run into something that I do not quite understand.

I have simulated some data (rnorm(352, mean = 155, sd = 10)) to which I apply two simple intercept-only models. In the first model, I use the default brms priors and in the second model I use a stricter prior for sigma. Yet, in the output, the estimates for sigma seem to be unaffected by the stricter prior on sigma, as seen in this graph:

question_priors_posteriors.pdf (10.4 KB)

Why is this the case? Could it be due to the prior on the intercept?

Here is the exact code I used:

library(brms)
library(tidyverse)

# Simulate data
set.seed(4)
data <- tibble(x = rnorm(352, mean = 155, sd = 10))

# Run model 1, the default brms model
model_default <- brm(
  x ~ 1,
  data = data,
  family = gaussian,
  prior = c(
    prior(student_t(3, 154.6, 10), class = "Intercept"),
    prior(student_t(3, 0, 10), class = "sigma")
  ),
  cores = 4,
  seed = 4, 
  sample_prior = TRUE,
)

# Run model 2 with a smaller sigma in the student_t prior for sigma
model_stricter_sigma <- brm(
  x ~ 1,
  data = data,
  family = gaussian,
  prior = c(
    prior(student_t(3, 154.6, 10), class = "Intercept"),
    prior(student_t(3, 0, 1), class = "sigma")
  ),
  cores = 4,
  seed = 4, 
  sample_prior = TRUE,
)

# Look at model output
model_default
model_stricter_sigma

# Visualize priors and posteriors
results_default <- model_default %>%
  posterior_samples() %>%
  select(ends_with("sigma")) %>%
  pivot_longer(cols = everything()) %>%
  mutate(
      model = "default",
      name = if_else(str_detect(name, "prior"), "prior", "posterior")
    )

results_strict <- model_stricter_sigma %>%
  posterior_samples() %>%
  select(ends_with("sigma")) %>%
  pivot_longer(cols = everything()) %>%
  mutate(
      model = "strict",
      name = if_else(str_detect(name, "prior"), "prior", "posterior")
    )

results <- bind_rows(results_default, results_strict)

ggplot(results, aes(x = value, fill = name)) +
  geom_histogram(binwidth = 0.1) +
  facet_wrap(~ model, ncol = 1) +
  coord_cartesian(xlim = c(0, 20))

Operating System: macOS Big Sur (latest)
brms Version: 2.14.4

hhau · December 12, 2020, 12:58pm

352 observations is quite a lot for such a simple model – try with 2, 5, or 10 observations and you should be able to see the impact of the prior.

BDA talk this phenomena, in a slightly different setting, in Section 2.5. In a handwavy kinda way, the posterior is a weighted average of the prior and the likelihood, where the weights depend on the sample size and variability.

willemsleegers · December 12, 2020, 2:07pm

Hm… I’m sure that is correct, but it still seems weird to me.

With the strict prior, there is very little density around 10, yet there is where the posterior settles. I’m surprised there is so little difference between the two different priors in terms of the posterior. Even when I use a sample size of 50, the differences between the two posteriors are minimal and barely noticeable with the human eye.

I also don’t find BDA particularly helpful; there’s way too much formula-speak and not enough intuition. I’m a much bigger fan of Statistical Rethinking (which unfortunately introduces this simple model with an unusual uniform sigma prior).

So, summarizing your comment, the explanation is that the data simply overwhelms the prior and that’s why they are so similar?

hhau · December 12, 2020, 2:28pm

Yes. It’s hard to say more without getting mathematical, but one other important point is that the prior is a Student-t density whilst your likelihood is a Gaussian.

Specifically, the Student-t has much heavier tails than the Gaussian, so whilst it might look like there is ‘very little density at \sigma \approx 10’, there is (in an equally hand-wavy kind of way):

> dt(10, df = 3) / dnorm(10)
[1] 4.0523e+18

18 orders of magnitude more density with the Student-t than the Gaussian. Try using Gaussian (half-normal) priors for sigma and see what happens.

willemsleegers · December 12, 2020, 2:35pm

Ah that’s interesting. I was aware that the student-t has heavier tails than the Gaussian, but I didn’t realize it was this much.

Running the code with gaussian priors indeed has the effect I initially expected.

This helps a lot, thanks!

Topic		Replies	Views
Brms doesn't appear to account for priors on `sigma` when using `sample_prior = "only"`? brms priors , prior-predictive , brms	2	551	April 6, 2023
Make model only based on prior brms fitting-issues , brms	2	673	April 12, 2022
Sampled Priors for Intercept don't match the setting brms prior-predictive	3	236	February 28, 2024
Linear model intercept has unexpectedly small variance Modeling brms	1	51	December 24, 2024
Difference between b & Intercept classes in brms Modeling rstan	3	2060	July 27, 2020

Question about sigma posterior not being affected much by sigma prior

Related topics