Bayesian difference in means test with BEST and brms packages

gpsmith · June 11, 2023, 9:12pm

The R package “BEST” calculates a simple Bayesian difference in means test. I’m trying to replicate the BEST package results using the brms package because the brms package has more flexibility and I want to learn how it works. My questions are: (1) what do the family link functions do in the brms package? Do they preprocess the data somehow?, (2) In light of the brms family link functions, how do I specify the parameters of the brms prior distributions? Do I specify the parameters as if the data were preprocessed or do I assume no preprocessing?, (3) The built-in priors for the BEST package for “sigma” and “nu” follow the Gamma distribution, which is what I’m trying to replicate. How do I correctly express those priors in the brms code? Thank you. Below is my R code.

library(BEST)

y1 <- c(5.77, 5.33, 4.59, 4.33, 3.66, 4.48)

y2 <- c(3.88, 3.55, 3.29, 2.59, 2.33, 3.59)

best_priors <- list(muM = c(6, 4), muSD = 2)

BESTout <- BESTmcmc(y1, y2, priors=best_priors, parallel=FALSE, rnd.seed = 123)

print(BESTout)

library(brms)

y <- data.frame(y = c(y1, y2), group = c(rep("y1", 6), rep("y2", 6)))

brmout <- brm(
    formula = brmsformula(y ~ 0 + group, sigma ~ 0 + group),
    data    = y,
    family  = student(link = "identity", link_sigma = "log", link_nu = "logm1"),
    prior   = c(
        set_prior("normal(6, 2)",                     class = "b", coef = "groupy1"),
        set_prior("normal(4, 2)",                     class = "b", coef = "groupy2"),
        set_prior("gamma(sd(y1), sd(y1)*5)", class = "b", coef = "groupy1", dpar = "sigma"),
        set_prior("gamma(sd(y2), sd(y2)*5)", class = "b", coef = "groupy2", dpar = "sigma"),
        set_prior("gamma(30, 30)",                class = "nu")
    ),
    seed    =   123
)

brmout

brmout <- as.data.frame(brmout)

summary(BESTout)

(mu1 <- mean(brmout$b_groupy1))

(mu2 <- mean(brmout$b_groupy2))

(muDiff <- mean(brmout$b_groupy1 - brmout$b_groupy2))

(sigma1 <- mean(exp(brmout$b_sigma_groupy1)))

(sigma2 <- mean(exp(brmout$b_sigma_groupy2)))

(sigmaDiff <- sigma1 - sigma2)

(nu <- mean(brmout$nu))

JimBob · June 11, 2023, 9:42pm

The family determines what sort of distribution the data will be modeled as (e.g., student_t, gaussian, bernoulli etc.).

The link functions can be used to indicate exactly how the model should run under the hood and also inform how you should set your priors. In the thing you have above, the link “identity” means that the MCMC for the mean will just be working with the actual numbers of the data (so if in the raw data you expect a mean of 10, you can set your priors like normal(10, 2.5).

But let’s imagine that you expect a standard deviation around the mean in the raw data to be somewhere around 3 points on the raw scale. You might think that then you should set your prior on sigma to be 3. However, the link = “log” means that the MCMC chain is actually going to be making estimations based on the log of the sample standard deviation, so when thinking about how to set your prior you should instead think about what your plausible standard deviations are and then log transform them. You could set that link function as “identity” and then you would be able to set priors on the raw scale of the data again, but there is probably a reason why it is log by default e.g., maybe it is computationally more efficient.

It doesn’t have anything to do with how you preprocess the data though. brms is just going to take whatever data you give it and try to estimate the parameters how to you tell it to, and it will drop NAs, but that’s it. You need to do any preprocessing steps yourself

gpsmith · June 11, 2023, 11:30pm

Thank you. So if my prior is distributed as Gamma with a location of 20 and a scale of 100, and my link function is “log”, I would code the prior as gamma(log(20), log(1/100)), or if my prior is distributed as Exponential with a location of 30, and my link function is “logm1”, I would code that prior as exponential(1/29)?

JimBob · June 12, 2023, 9:54am

Well, probably not exactly like that because the distribution gamma(log(20), log(1/100)) is probably not the same as the distribution of log(gamma(20, 1/100)). Just as an example I show this with a normal distribution below. Basically if you wanted to log version of the prior to match the identity version of the prior you would want to try and approximate the best way of describing log(gamma(20, 1/100)) not gamma(log(20), log(1/100)) - I’m not sure but maybe you can literally set a prior of log(gamma(20, 1/100)) [note that I’m just copy-pasting your gamma dists here, I don’t know exactly how to specify gamma myself]

test_example <- 
  tibble(identity = rnorm(5000, 10, 2.5), # normal dist prior with mean of 10 and sd of 2.5
         log_direct = rnorm(5000, log(10), log(2.5)), # normal dist with mean of log(10) and sd of log(2.5)
         log_identity = log(identity), # log transformed original normal dist
         exp_log_direct = exp(log_direct)) %>%  
  pivot_longer(cols = everything(),
               names_to = "dist")

ggplot(test_example) +
  scale_x_continuous(limits = c(0, 25)) +
  geom_density(aes(x = value, fill = dist), alpha = .33)

You can see that if you exponentiate the distribution where we set it like log(mean) and log(sd), it is very different from our target prior distribution on the identity scale

Topic		Replies	Views
How to get a test with brms brms	8	2204	September 24, 2018
Bayes factor = 0.00 Modeling brms	0	338	January 20, 2023
Difference in posterior distributions for group effects for brms and hand coded stan model Modeling rstan , brms	2	934	February 17, 2022
Help with understanding of bayesian inference using GLM and BRMS brms	3	1403	November 13, 2018
A parameterization-agnostic way to get posterior samples of marginal means in {brms} Interfaces brms	1	386	June 2, 2023

Bayesian difference in means test with BEST and brms packages

Related topics