Posterior distribution for difference of two propotions with brms in R

chrisjake · June 16, 2022, 10:09am

I would like to produce a posterior distribution for the difference of two groups. I can do this in JAGS in R, but am hoping to move into this century and replicate this with Stan. I am using brms.

My two questions are:

Have I correctly produced the posterior distribution for the effect size?
How can I account for a prior effect size that could be negative? I imagine this requires using a non-beta prior, but am not sure what would be preferable.

The below code specifies a control and intervention group, with a control and intervention group event rate. It generates beta distributions for each group (step 1), prior distributions (step 2), builds a binomial model (step 3), and posteriors (step 4).

I am reasonably confident what I’ve done in the above steps is correct. I am a bit uncertain as to what the values generated by as_draws_df are - namely the b_Intercept_p vs. b_groupint_p. Further reading on these forums suggested subtracting b_groupint_p from b_Intercept_p; which does produce a plot that could plausibly be the posterior interval (labelled new_p when plotting the alt.graph object), but I am pretty uncertain on this point and would appreciate any clarification.

# Setup
## Packages
library(tidyverse)
library(ProbBayes)
library(brms)
library(tidybayes)

## Options
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())



# ***
# Setup simulation
# ***

# Trial data
con.n = 150
con.event = 50
int.n = 150
int.event = 30


# Distributions
## x1 is the expected control event rate at probability p.x1
## x2 is the control event rate at probability p.x2
x1 = 0.4
p.x1 = 0.5
x2 = 0.6
p.x2 = 0.9

## As above, but for the difference between intervention and control group
y1 = 0.05
p.y1 = 0.5
y2 = 0.1
p.y2 = 0.9


# Model setup
## For brm()
n.chains = 3
n.iter = 4000
n.warmup = 500

## For beta.select(); based on model so the number of observations are equal
n.draws = n.iter - n.warmup



# ***
# Simulation starts here
# ***



# Define Functions
## Log transformation
fun_log.trans = function(x) {
  log.trans = log(x / (1 - x))
  log.trans
}


## Inverse log transformation
fun_invlog.trans = function(x) {
  invlog.trans = exp(x) / (1 + exp(x))
  invlog.trans
}



# Run things
# 0. Put data into dataframe
data = data.frame(group = c("int"    , "con"),
                  n =     c(int.n    , con.n),
                  event = c(int.event, con.event))


# 1. Generate prior distributions
## Beta prior for the control group event rate
beta0.val = beta.select(list(x = x1, p = p.x1),
                        list(x = x2, p = p.x2))
p0.sim = rbeta(n.draws, beta0.val[1], beta0.val[2])
### Log transform it
theta0.sim = fun_log.trans(p0.sim)


## Prior distribution for the difference in logit-p for each group
beta1.val = beta.select(list(x = y1, p = p.y1),
                        list(x = y2, p = p.y2))
p1.sim = rbeta(n.draws, beta1.val[1], beta1.val[2])
### Log transform
theta1.sim = fun_log.trans(p1.sim)


## 2. Create a matrix to store priors
priors = get_prior(family = binomial,
                   event | trials(n) ~ group,
                   data = data)

### Get characteristics of the normal distribution for the priors generated in step 1
theta0.val = c(mean(theta0.sim), sd(theta0.sim))
theta1.val = c(mean(theta1.sim), sd(theta1.sim))

### Input the these characteristics into the prior matrix generated at the start of step 2
priors$prior[3] = paste("normal(", theta0.val[1], ", ", theta0.val[2], ")", sep = "")
priors$prior[2] = paste("normal(", theta1.val[1], ", ", theta1.val[2], ")", sep = "")



# 3. Generate model with stan
model = brm(family = binomial,
            event | trials(n) ~ group,
            data = data,
            prior = priors,
            iter = n.iter,
            warmup = n.warmup,
            chains = n.chains,
            refresh = 0)



## Plot model
plot(model)
print(model)

# 4. Generate posteriors
posteriors = as_draws_df(model)

## Inverse log transform function on theta to get p again
posteriors = posteriors %>%
  mutate(across(starts_with("b_"), .f = fun_invlog.trans, .names = "{.col}_p")) %>%
  rename_with(~paste0(., "_theta"), .cols = starts_with("b_") & !ends_with("_p"))

## 95% posterior interval estimate
quantile(posteriors$b_Intercept_p, c(0.025, 0.975))


# 5. Plot posterior densities
## Take the posterior interval data and bind it with the priors
## Ideally, n.draws = number of iterations - warmup
graph = posteriors %>%
  select(ends_with("_p")) %>%
  cbind(p0.sim, p1.sim) %>%
  pivot_longer(cols = everything(),
               names_to = "distribution",
               values_to = "value")

alt.graph = posteriors %>%
  mutate(new_p = b_Intercept_p - b_groupint_p) %>%
  select(new_p) %>%
  cbind(p0.sim, p1.sim)

quantile(alt.graph$new_p, c(0.025, 0.975))

alt.graph = alt.graph %>%
  pivot_longer(cols = everything(),
               names_to = "distribution",
               values_to = "value")

## Plot it
alt.graph %>%
  ggplot(aes(x = value)) +
  geom_density(aes(fill = distribution),
               alpha = 0.5) +
  theme_light()

Topic		Replies	Views
Different posterior-based ways for describing the effect size? Modeling brms	1	404	September 24, 2021
Difference in posterior distributions for group effects for brms and hand coded stan model Modeling rstan , brms	2	931	February 17, 2022
Is my approach to posterior distribution of effect size in linear model valid? brms	8	1795	May 27, 2019
Contrasts in BRMS brms	21	5148	April 1, 2020
Extracting posterior samples of smooth effects in brms brms	1	2201	August 12, 2019

Posterior distribution for difference of two propotions with brms in R

Related topics