Gaussian vs. t distribution for RE distributions in brms model - discrepancy in fixed effects

Hi all,

I am modeling some data using two different distributions for the random effects (Gaussian and t); @paul.buerkner kindly incorporated the ability to specify the distributions of REs in the brms call.

Here is the brms code of the two models for comparison:
md_nd = brms::brm(
md ~ time + (1+ time|eye),
data = df,
family = gaussian(),
set_prior(“normal(0,10)”, class=‘b’),
warmup = 1000, iter = 8000, chains = 4,
control=list(adapt_delta=0.8))

md_t = brms::brm(
  md ~ time + (1+ time|gr(eye, dist = "student")),
  data = df,
  family = gaussian(),
  set_prior("normal(0,10)", class='b'),
  warmup = 1000, iter = 8000, chains = 4,
  control=list(adapt_delta=0.8))

The only difference between the two models should be the RE distribution. However, when I run these models, the population-level fixed effects are significantly different, both for the Intercept and the predictor (time). I am unable to understand why this is occurring, given I am setting the priors for the beta’s to be normal(0,10) in both models.

Is there something incorrect with the code or priors? Thank you in advance!

Hello!

How much different are the estimates of population-level effects?

I guess that if some "eye"s have extreme responses, it could be captured differently by the two types of random effects. I think the student distribution might cause a lesser shrinkage of extreme intercepts and slopes, which could then have a lesser influence on the population level effect (because captured by the hierarchical one). At the inverse, if the normal distribution shrinks more these extreme observations, the population-level effect might have to move to ensure a good fit.

I think the visual exploration of the individual"eye"s relationship might provide elements to answer your question.

Did I make sense??

Have a good day!
Lucas

EDIT : typo

Hi Lucas,

Thanks for your note and thoughts.

Below are the fixed effects from the two models.

Gaussian:

Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -3.77 0.06 -3.89 -3.66 1.00 1420 2708
time -0.16 0.00 -0.16 -0.15 1.00 8097 13584

t:

Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -1.69 0.03 -1.75 -1.62 1.00 3909 9851
time -0.07 0.00 -0.07 -0.06 1.00 12017 20565

You can see quite a significant difference between the two models; the values of both the intercept and predictor of one model are not within the credible intervals of the other.

Do you think that could all be from what you had suggested?

Hi!

Honestly, I am not sure. I have played with simulation to explore the possibility that a misspecified hierarchical effect distribution biases the population-level effect estimates, but it is not really conclusive… It seemed that it depends greatly on the particular dataset.

I noted that student hierarchical effects were harder to sample from, with sometimes poorly recovered degrees of freedom.

What I would suggest doing:

  • A simulation-based calibration of population-level effects, fitting models to data generated by the other hierarchical effect distribution. I’d like to test that in a case study, but it would take some time because I shall concentrate on something else.
  • Compare the two models using loo, and see if one provides better out-of-sample predictions.

I wish you good success!
Lucas

EDIT : I join the code for the simulations

# Empty the environment
rm(list = ls())

# Load libraries
library(brms)
library(useful)
library(tidyverse)

# Set seed
set.seed(77)
# Number of categories
K = 40

# Population level parameter
alpha = 7
beta = 2
# Individual level parameters
alpha_K = rstudent_t(K,3, 0, 1)
beta_K = rstudent_t(K,3, 0, 1)

# Create predictor data frame
D = data.frame(time = rep(1:5, K),
               cat = factor(rep(1:K, each = 5)))
# Create model matrices
u_int = build.x(formula = ~ cat, data = D, contrasts = F)[,-1]
u_slope <- build.x(formula = ~ cat:time, data = D, contrasts = F)[,-1]

# Compute linear predictor
mu <- alpha + beta * D$time + u_int %*% alpha_K + u_slope %*% beta_K

# Sample observed values
D$y <- rnorm(length(mu), mu, 1)

# Observe simulated data
D %>% ggplot(aes(x = time, y = y)) + 
  geom_point(aes(color = cat)) + 
  geom_smooth(se = F, method = "lm") + 
  geom_smooth(aes(color = cat), se = F, method = "lm")

# Fit a normal model to t data
form_norm <- bf(y ~ time + (1 + time||cat))

# Observe priors
get_prior(form_norm, data = D)

# Set prior
prior_norm <- c(prior(normal(0,5), class = b),
             prior(normal(0,3), class = sd))

# Sample from the posterior
fit_norm <- brm(form_norm,
             prior = prior_norm,
             data = D,
             chains = 3, cores = 3)
fit_norm

# Fit a t model to t data
form_t <- bf(y ~ time + (1 + time||gr(cat, dist = "student")))

# Observe priors
get_prior(form_t, data = D)

# Set prior
prior_t <- c(prior(normal(0,5), class = b),
             prior(normal(0,3), class = sd),
             prior(normal(0,10), class = df, group= cat))

# Sample from the posterior
fit_t <- brm(form_t,
             prior = prior_t,
             data = D,
             chains = 3, cores = 3)
fit_t