Thanks for the suggestions, @stijn and @dilsherdhillon . I’ll add a bit more context. I’m running this model for hundreds of proteins and want to heavily regularize the contrasts. The idea of “fusion” as discussed in the link in my original post is that some groups may be essentially no different, so the model-estimated means should be very close or equal for those groups. This is what I’m trying to accomplish, and one way of doing it is to put regularizing priors on all contrasts. I think this can be accomplished in Stan (Gelman post) but I was wondering if it’s possible in brms
.
Normally I would fit a varying intercepts model like y ~ 1 + (1 | level)
. The problem is that the levels have very different sample sizes, so the groups are unequally regularized. This is usually a beneficial feature of a multilevel model, but in this case, I’m interested in all pairwise contrasts, so this can actually result in some contrasts not being regularized. Simulated example below:
- y | X \in {GroupA, GroupB, GroupC} \sim N(3.5, 1)
- y | X \in {GroupD, GroupE} \sim N(6, 1)
Sample sizes are N=50 for Groups A, B, C, E and N=5 for Group D
library(tidyverse)
library(brms)
library(tidybayes)
set.seed(1)
sim_data <- data.frame(y = c(rnorm(50*3, 3.5, 1),
rnorm(50+5, 6, 1)),
X = rep(paste0("Group", LETTERS[1:5]),
c(50, 50, 50, 5, 50))) %>%
mutate(X = as.factor(X))
Difference in sample means of Group E and Group D is 0.605.
With a HalfNormal(0, 1) prior on the scale parameter.
sim_fit_multilevel <- sim_data %>%
brm(y ~ 1 + (1 | X),
data=.,
prior = set_prior("normal(0, 1)", class = "sd"),
seed = 123,
chains = 1,
iter = 3000)
Estimate the difference in model-estimated means:
sim_data %>%
distinct(X) %>%
add_fitted_draws(model = sim_fit_multilevel) %>%
ungroup %>%
select(X, .draw, .value) %>%
spread(key=X, value=.value) %>%
median_qi(contrastDE = GroupE - GroupD)
Estimated difference in means of Group E and Group D is 0.702 (-0.172, 1.54), which is larger than the difference in sample means.
With a tighter HalfCauchy(0, 0.1) prior on the scale parameter.
sim_fit2_multilevel <- sim_data %>%
brm(y ~ 1 + (1 | X),
data=.,
prior = set_prior("cauchy(0, 0.1)", class = "sd"),
seed = 123,
control = list(adapt_delta = 0.9),
chains = 1,
iter = 3000)
Estimate the difference in model-estimated means:
sim_data %>%
distinct(X) %>%
add_fitted_draws(model = sim_fit2_multilevel) %>%
ungroup %>%
select(X, .draw, .value) %>%
spread(key=X, value=.value) %>%
median_qi(contrastDE = GroupE - GroupD)
Estimated difference in means of Group E and Group D is 0.715 (-0.143, 1.57), again larger than the difference in sample means.