Calculating contrasts for 2 factor interaction term with unbalanced design

I fitted a hurdle gamma model with an interaction term. Everything works well. I get the posterior epreds and the posterior predictions using the tidybase::add_predicted_draws and tidybayes::add_epred_draws functions. I can plot the data and everything looks perfect.

But I am struggling with calculating contrasts. My factor levels look like this:

F1 <- c("reference", "A", "B", "C", "D", "E", "F")
F2 <- c("W", "X", "Y", "Z")
expand.grid(F1 = F1, F2 = F2)
#           F1 F2
# 1  reference  W
# 2          A  W
# 3          B  W
# 4          C  W
# 5          D  W
# 6          E  W
# 7          F  W
# 8  reference  X
# 9          A  X
# 10         B  X
# 11         C  X
# 12         D  X
# 13         E  X
# 14         F  X
# 15 reference  Y
# 16         A  Y
# 17         B  Y
# 18         C  Y
# 19         D  Y
# 20         E  Y
# 21         F  Y
# 22 reference  Z
# 23         A  Z
# 24         B  Z
# 25         C  Z
# 26         D  Z
# 27         E  Z
# 28         F  Z

This is my model:

m <- brm(bf(y ~ F1 * F2, hu ~ F1 * F2), family = "hurdle_gamma", data = d)

However, one complication is that the design is unbalanced, i.e. the number of observations for each factor combination varies. I would like to calculate the difference between reference:W and A:W, B:W, C:W,D:W, E:W, F:W; then reference:X and A:X, B:X, C:X, D:X, E:X, F:X; etc. until reference:Z and A:Z, B:Z, C:Z, D:Z, E:Z, F:Z

I was hoping someone can point me in the right direction. Thanks!

Welcome Stefan!

Are you familiar with the emmeans package?

I think this is what you want?

Hi Angelos! Thank you for your suggestion! What I would really like to do is to “simply” calculate the difference manually and get posterior distributions of differences which I then can summarize and further manipulate. The problem I’m having is the unequal sample sizes within those comparison. Thank you!

Can you explain how the unequal sample size gets in the way? Or share a reprex?

EDIT: I was focusing on the wrong thing (again)

Don’t use add_

library(tidybayes)
new = expand.grid(F1 = F1, F2 = F2)
epr = m %>% 
  epred_draws(newdata = new)

Yes! epred_draws() combined with new data via expand.grid() works for the contrasts! However, the epred plots (not the contrasts) look different when I compare add_epred_draws() vs epred_draws(newdata =). Below is a reprex. Would you mind checking whether this all makes sense? Thank you!

set.seed(2)
d <- tibble(
  F1 = rep(
    c("control", "treatment1", "treatment2", "treatment3"),
    3,
    each = 10
  ),
  F2 = rep(c("A", "B", "C"), each = 40),
  G = rep(c("G1", "G2", "G3"), each = 40),
  Y = rnorm(120, 10, 4)
)

m <- brm(Y ~ F1 * F2 + (1 | G), data = d, family = "gaussian")

# add_epred_draws and plots results
d %>%
  add_epred_draws(m)  %>%
  ggplot(aes(F1, .epred)) +
  stat_pointinterval(.width = c(0.66, 0.95), point_interval = "median_qi") +
  labs(x = "Factor1", y = "Y") +
  coord_flip() +
  facet_wrap( ~ F2, scales = "free") +
  theme_bw()

# calculate differences between reference group to other levels (F1) within 2nd factor (F2)
d_ex <- expand.grid(
  F1 = c("control", "treatment1", "treatment2", "treatment3"),
  F2 = c("A", "B", "C"),
  G = c("G1", "G2", "G3")
)

# this plot using epred_draws looks different when compared to add_epred_draws
m %>%
  epred_draws(newdata = d_ex)  %>%
  ggplot(aes(F1, .epred)) +
  stat_pointinterval(.width = c(0.66, 0.95), point_interval = "median_qi") +
  labs(x = "Factor1", y = "Y") +
  coord_flip() +
  facet_wrap( ~ F2, scales = "free") +
  theme_bw()


# calculating differences when using epred_draws works
m %>%
  epred_draws(newdata = d_ex) %>%
  group_by(.draw, F2, G) %>%
  mutate(control = .epred[F1 == "control"]) %>%
  mutate(diff = .epred - control) %>%
  filter(F1 != "control") %>%
  ggplot(aes(F1, diff)) +
  geom_hline(yintercept = 0, linetype = "dotted") +
  stat_pointinterval(.width = c(0.66, 0.95), point_interval = "median_qi") +
  labs(x = "Factor1", y = "Y") +
  coord_flip() +
  facet_wrap( ~ F2, scales = "free") +
  theme_bw() 

These differences go away if you specify re_formula = NA in add_epred_draws and epred_draws. Off the top of my head, I can’t explain why they exist when re_formula = NULL.

This might help.