The problem
The popular book ‘Statistical Rethinking’ by Richard McElreath uses index variables (instead of indicator/dummy variables) during specification of Bayesian models. Solomon Kurz kindly translated the code for use with {brms}.
Although the aforementioned books cover how to use index variables for interactions with continuous terms, it is less clear how this would be achieved with two categorical variables. I know I should use the non-linear {brms} syntax, and I attempted three models below (both should be easily reproducible).
I am uncertain if it is reasonable to simply add the interaction term; furthermore, the additional terms in the output become a bit trickier to understand. I am not sure where to begin except by plotting.
Setup
Below is some fake simulated data creating a positive association between outcome
by group
, but a reversed relationship appears when examined by outcome
, group
and covar
.
set.seed(1295)
# Libraries
library(dplyr)
library(ggplot2)
library(brms)
# Fake data where overall 1 group is + associated but by covar they have opposite
covar_vec <- c('Problem', 'NoProblem')
fake_data <- data.frame(
group = factor(c(rep(1, 10), rep(2, 10))),
covar = factor(c(rep(covar_vec[1], 5), rep(covar_vec[2], 5),
rep(covar_vec[1], 5), rep(covar_vec[2], 5))),
outcome = factor(c(1,1,1,1,0,
0,0,0,1,1,
0,0,0,1,1,
1,1,1,0,0))
)
Examine data relationship
Data shows the positive and negative assocations
# Plot to confirm relationship
fake_data_sum1 <- fake_data |> dplyr::group_by(covar, outcome) |> dplyr::summarize(n = dplyr::n())
fake_data_sum2 <- fake_data |> dplyr::group_by(group, outcome) |> dplyr::summarize(n = dplyr::n())
fake_data_sum3 <- fake_data |> dplyr::group_by(group, covar, outcome) |> dplyr::summarize(n = dplyr::n())
ggplot(fake_data_sum3) +
geom_col(aes(x = group, y = n, fill = outcome)) +
facet_wrap(~covar, nrow = 2) +
theme_bw()
Models
Created models with no interaction and with an interaction using index variables. Priors are assumed to be flat/very weak on the outcome scale. The model without the interaction terms appears to only discover the linear relationship. Once group:covar
is included, the reverse direction is seen. The second method for including interactive terms is very similar, though perhaps not identical. I am just unclear as to if this is an acceptable approach; the summary table becomes increasingly difficult to understand without plots (which to my understanding, is common for models like these…)
noInt <- brm(bf(outcome ~ 0 + a + b,
a ~ 0 + group,
b ~ 0 + covar,
nl = TRUE),
prior = c(prior(normal(0, 1.75), nlpar = a),
prior(normal(0, 1.75), nlpar = b)),
data = fake_data,
family = bernoulli(link = 'logit'),
chains = 4, cores = 4)
wInt <- brm(bf(outcome ~ 0 + a + b + c,
a ~ 0 + group,
b ~ 0 + covar,
c ~ 0 + group:covar,
nl = TRUE),
prior = c(prior(normal(0, 1.75), nlpar = a),
prior(normal(0, 1.75), nlpar = b),
prior(normal(0, 1.75), nlpar = c)),
data = fake_data,
family = bernoulli(link = 'logit'),
chains = 4, cores = 4)
# Syntax similar to books... looks same as `:` terms
wInt2 <- brm(bf(outcome ~ 0 + a + b * covar,
a ~ 0 + group,
b ~ 0 + group,
nl = TRUE),
prior = c(prior(normal(0, 1.75), nlpar = a),
prior(normal(0, 1.75), nlpar = b)),
data = fake_data,
family = bernoulli(link = 'logit'),
chains = 4, cores = 4)
# Plot outcomes
gridExtra::grid.arrange(
plot(conditional_effects(wInt, 'group:covar'), plot = FALSE)[[1]] + ggtitle('w/Interaction'),
plot(conditional_effects(wInt2, 'group:covar'), plot = FALSE)[[1]] + ggtitle('w/Interaction2'),
plot(conditional_effects(noInt, 'group:covar'), plot = FALSE)[[1]] + ggtitle('w/oInteraction')
)
Summary tables
Conditional plots
System Info
- Operating System: Win10
- brms Version: 2.20.4