When I was learning Stan and Bayesian methods, the Bayes factor was not covered much, but I’m finding it might be useful to present findings to clients. I wanted to make sure that I am interpreting some output correctly.
Consider the following R code:
library(rstanarm); library(bridgesampling) inv_logit <- function(x) exp(x) / (1 + exp(x)) set.seed(1839) n <- 500 x <- rbinom(n, 1, .5) y <- rbinom(n, 1, inv_logit(.3 + .2 * x)) z <- rbinom(n, 1, .3) dat <- data.frame(x, y, z)
This creates the data. Now what I can do is look at the probability (given the data, priors, and likelihood) that there is a positive effect of
x (note that this is usually done with some value of practical significance, like 1 ppt, but left it at zero here):
# probability of effect of x > 0 mod <- stan_glm(y ~ x, binomial, dat, seed = 1839) mean(as.matrix(mod)[, 2] > 0)
This gives me about a 97% chance that the effect is positive. Now, let’s imagine we want to test if the effect of
x depended on a demographic variable
z. I could do some model comparison like:
# probability of no interaction vs. interaction mod_h0 <- stan_glm(y ~ x + z, binomial, dat, seed = 1839, diagnostic_file = file.path(tempdir(), "df.csv")) mod_h1 <- stan_glm(y ~ x * z, binomial, dat, seed = 1839, diagnostic_file = file.path(tempdir(), "df.csv")) bayes_factor(bridge_sampler(mod_h0), bridge_sampler(mod_h1))
Which tells me that the Bayes factor for
h0 / h1 is 4.32. Since the priors in both models are the same (note that I usually give them some semi-informative prior, but they are always the same in both models), the Bayes factor simplifies to the probability of no interaction over the probability with an interaction. So, would a fair interpretation be: “It is 4.32 times more likely that the effect of X is the same across both groups Z” and “There is about an 81% chance there is no interaction, while there’s a 19% chance the interaction is non-zero” (since 4.32 / (4.32 + 1) = .81).
Am I interpreting the Bayes factor correctly?