Calculating contrasts for 2 factor interaction term with unbalanced design

mcsnowface · September 16, 2024, 4:21am

I fitted a hurdle gamma model with an interaction term. Everything works well. I get the posterior epreds and the posterior predictions using the tidybase::add_predicted_draws and tidybayes::add_epred_draws functions. I can plot the data and everything looks perfect.

But I am struggling with calculating contrasts. My factor levels look like this:

F1 <- c("reference", "A", "B", "C", "D", "E", "F")
F2 <- c("W", "X", "Y", "Z")
expand.grid(F1 = F1, F2 = F2)
#           F1 F2
# 1  reference  W
# 2          A  W
# 3          B  W
# 4          C  W
# 5          D  W
# 6          E  W
# 7          F  W
# 8  reference  X
# 9          A  X
# 10         B  X
# 11         C  X
# 12         D  X
# 13         E  X
# 14         F  X
# 15 reference  Y
# 16         A  Y
# 17         B  Y
# 18         C  Y
# 19         D  Y
# 20         E  Y
# 21         F  Y
# 22 reference  Z
# 23         A  Z
# 24         B  Z
# 25         C  Z
# 26         D  Z
# 27         E  Z
# 28         F  Z

This is my model:

m <- brm(bf(y ~ F1 * F2, hu ~ F1 * F2), family = "hurdle_gamma", data = d)

However, one complication is that the design is unbalanced, i.e. the number of observations for each factor combination varies. I would like to calculate the difference between reference:W and A:W, B:W, C:W,D:W, E:W, F:W; then reference:X and A:X, B:X, C:X, D:X, E:X, F:X; etc. until reference:Z and A:Z, B:Z, C:Z, D:Z, E:Z, F:Z

I was hoping someone can point me in the right direction. Thanks!

amynang · September 16, 2024, 6:42am

Welcome Stefan!

Are you familiar with the emmeans package?

I think this is what you want?

mcsnowface · September 16, 2024, 11:11am

Hi Angelos! Thank you for your suggestion! What I would really like to do is to “simply” calculate the difference manually and get posterior distributions of differences which I then can summarize and further manipulate. The problem I’m having is the unequal sample sizes within those comparison. Thank you!

amynang · September 16, 2024, 11:24am

Can you explain how the unequal sample size gets in the way? Or share a reprex?

EDIT: I was focusing on the wrong thing (again)

Don’t use add_

library(tidybayes)
new = expand.grid(F1 = F1, F2 = F2)
epr = m %>% 
  epred_draws(newdata = new)

mcsnowface · September 16, 2024, 6:49pm

Yes! epred_draws() combined with new data via expand.grid() works for the contrasts! However, the epred plots (not the contrasts) look different when I compare add_epred_draws() vs epred_draws(newdata =). Below is a reprex. Would you mind checking whether this all makes sense? Thank you!

set.seed(2)
d <- tibble(
  F1 = rep(
    c("control", "treatment1", "treatment2", "treatment3"),
    3,
    each = 10
  ),
  F2 = rep(c("A", "B", "C"), each = 40),
  G = rep(c("G1", "G2", "G3"), each = 40),
  Y = rnorm(120, 10, 4)
)

m <- brm(Y ~ F1 * F2 + (1 | G), data = d, family = "gaussian")

# add_epred_draws and plots results
d %>%
  add_epred_draws(m)  %>%
  ggplot(aes(F1, .epred)) +
  stat_pointinterval(.width = c(0.66, 0.95), point_interval = "median_qi") +
  labs(x = "Factor1", y = "Y") +
  coord_flip() +
  facet_wrap( ~ F2, scales = "free") +
  theme_bw()

# calculate differences between reference group to other levels (F1) within 2nd factor (F2)
d_ex <- expand.grid(
  F1 = c("control", "treatment1", "treatment2", "treatment3"),
  F2 = c("A", "B", "C"),
  G = c("G1", "G2", "G3")
)

# this plot using epred_draws looks different when compared to add_epred_draws
m %>%
  epred_draws(newdata = d_ex)  %>%
  ggplot(aes(F1, .epred)) +
  stat_pointinterval(.width = c(0.66, 0.95), point_interval = "median_qi") +
  labs(x = "Factor1", y = "Y") +
  coord_flip() +
  facet_wrap( ~ F2, scales = "free") +
  theme_bw()


# calculating differences when using epred_draws works
m %>%
  epred_draws(newdata = d_ex) %>%
  group_by(.draw, F2, G) %>%
  mutate(control = .epred[F1 == "control"]) %>%
  mutate(diff = .epred - control) %>%
  filter(F1 != "control") %>%
  ggplot(aes(F1, diff)) +
  geom_hline(yintercept = 0, linetype = "dotted") +
  stat_pointinterval(.width = c(0.66, 0.95), point_interval = "median_qi") +
  labs(x = "Factor1", y = "Y") +
  coord_flip() +
  facet_wrap( ~ F2, scales = "free") +
  theme_bw()

amynang · September 16, 2024, 9:43pm

These differences go away if you specify re_formula = NA in add_epred_draws and epred_draws. Off the top of my head, I can’t explain why they exist when re_formula = NULL.

This might help.

Topic		Replies	Views
Estimating contrasts (code peer-review) brms specification , brms , emmeans	3	822	May 20, 2021
Computing and summarizing contrasts from brmsfit brms	2	883	November 8, 2019
Need help understanding hurdle (hurdle_gamma) models using brms brms	2	1461	July 19, 2020
Computing Bayes Factor in hurdle model brms specification	3	743	December 7, 2020
Computation of contrasts using emmeans based on brms input in R brms emmeans	1	343	February 3, 2024

Calculating contrasts for 2 factor interaction term with unbalanced design

Related topics