I would like to produce a posterior distribution for the difference of two groups. I can do this in JAGS in R, but am hoping to move into this century and replicate this with Stan. I am using brms.
My two questions are:
- Have I correctly produced the posterior distribution for the effect size?
- How can I account for a prior effect size that could be negative? I imagine this requires using a non-beta prior, but am not sure what would be preferable.
The below code specifies a control and intervention group, with a control and intervention group event rate. It generates beta distributions for each group (step 1), prior distributions (step 2), builds a binomial model (step 3), and posteriors (step 4).
I am reasonably confident what I’ve done in the above steps is correct. I am a bit uncertain as to what the values generated by as_draws_df are - namely the b_Intercept_p vs. b_groupint_p. Further reading on these forums suggested subtracting b_groupint_p from b_Intercept_p; which does produce a plot that could plausibly be the posterior interval (labelled new_p when plotting the alt.graph object), but I am pretty uncertain on this point and would appreciate any clarification.
# Setup
## Packages
library(tidyverse)
library(ProbBayes)
library(brms)
library(tidybayes)
## Options
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
# ***
# Setup simulation
# ***
# Trial data
con.n = 150
con.event = 50
int.n = 150
int.event = 30
# Distributions
## x1 is the expected control event rate at probability p.x1
## x2 is the control event rate at probability p.x2
x1 = 0.4
p.x1 = 0.5
x2 = 0.6
p.x2 = 0.9
## As above, but for the difference between intervention and control group
y1 = 0.05
p.y1 = 0.5
y2 = 0.1
p.y2 = 0.9
# Model setup
## For brm()
n.chains = 3
n.iter = 4000
n.warmup = 500
## For beta.select(); based on model so the number of observations are equal
n.draws = n.iter - n.warmup
# ***
# Simulation starts here
# ***
# Define Functions
## Log transformation
fun_log.trans = function(x) {
log.trans = log(x / (1 - x))
log.trans
}
## Inverse log transformation
fun_invlog.trans = function(x) {
invlog.trans = exp(x) / (1 + exp(x))
invlog.trans
}
# Run things
# 0. Put data into dataframe
data = data.frame(group = c("int" , "con"),
n = c(int.n , con.n),
event = c(int.event, con.event))
# 1. Generate prior distributions
## Beta prior for the control group event rate
beta0.val = beta.select(list(x = x1, p = p.x1),
list(x = x2, p = p.x2))
p0.sim = rbeta(n.draws, beta0.val[1], beta0.val[2])
### Log transform it
theta0.sim = fun_log.trans(p0.sim)
## Prior distribution for the difference in logit-p for each group
beta1.val = beta.select(list(x = y1, p = p.y1),
list(x = y2, p = p.y2))
p1.sim = rbeta(n.draws, beta1.val[1], beta1.val[2])
### Log transform
theta1.sim = fun_log.trans(p1.sim)
## 2. Create a matrix to store priors
priors = get_prior(family = binomial,
event | trials(n) ~ group,
data = data)
### Get characteristics of the normal distribution for the priors generated in step 1
theta0.val = c(mean(theta0.sim), sd(theta0.sim))
theta1.val = c(mean(theta1.sim), sd(theta1.sim))
### Input the these characteristics into the prior matrix generated at the start of step 2
priors$prior[3] = paste("normal(", theta0.val[1], ", ", theta0.val[2], ")", sep = "")
priors$prior[2] = paste("normal(", theta1.val[1], ", ", theta1.val[2], ")", sep = "")
# 3. Generate model with stan
model = brm(family = binomial,
event | trials(n) ~ group,
data = data,
prior = priors,
iter = n.iter,
warmup = n.warmup,
chains = n.chains,
refresh = 0)
## Plot model
plot(model)
print(model)
# 4. Generate posteriors
posteriors = as_draws_df(model)
## Inverse log transform function on theta to get p again
posteriors = posteriors %>%
mutate(across(starts_with("b_"), .f = fun_invlog.trans, .names = "{.col}_p")) %>%
rename_with(~paste0(., "_theta"), .cols = starts_with("b_") & !ends_with("_p"))
## 95% posterior interval estimate
quantile(posteriors$b_Intercept_p, c(0.025, 0.975))
# 5. Plot posterior densities
## Take the posterior interval data and bind it with the priors
## Ideally, n.draws = number of iterations - warmup
graph = posteriors %>%
select(ends_with("_p")) %>%
cbind(p0.sim, p1.sim) %>%
pivot_longer(cols = everything(),
names_to = "distribution",
values_to = "value")
alt.graph = posteriors %>%
mutate(new_p = b_Intercept_p - b_groupint_p) %>%
select(new_p) %>%
cbind(p0.sim, p1.sim)
quantile(alt.graph$new_p, c(0.025, 0.975))
alt.graph = alt.graph %>%
pivot_longer(cols = everything(),
names_to = "distribution",
values_to = "value")
## Plot it
alt.graph %>%
ggplot(aes(x = value)) +
geom_density(aes(fill = distribution),
alpha = 0.5) +
theme_light()