I’m working on analyzing some experimental ecological data, where some demographic rate (y
) is expected to vary by species
and experimental treatment
. Let’s say that I have some prior knowledge about the range of y
that I would like to use to specify weakly informative priors, but that this knowledge is agnostic to species
or treatment
. As such, I would like to set priors that yield the same (or similar) distribution of prior expectations for any given combination of species
and treatment
. Perhaps I’m missing something obvious, but I’ve been struggling to achieve this using the R formula syntax in brms
.
Here is some R code to generate synthetic data and simulate prior expectations:
library(brms)
library(tidybayes)
library(ggplot2)
set.seed(505)
# Specify study design
n <- 1000
n_species <- 8
n_treatment <- 2
# Specify coefficients
Intercept <- 0
b_species <- c(0, rnorm(n_species - 1, 0, 1))
b_treatment <- c(0, rnorm(n_treatment - 1, -0.3, 0.5))
# Simulate data
species <- sample(seq(1:n_species), n, replace = TRUE)
treatment <- sample(seq(1:n_treatment), n, replace = TRUE)
df <- data.frame(species, treatment)
logit_mu <- apply(df, 1, function(x) Intercept + b_species[x[1]] + b_treatment[x[2]])
mu <- inv_logit_scaled(logit_mu)
trials <- 30
y <- rbinom(n, trials, mu)
df$species <- as.factor(df$species)
df$treatment <- as.factor(df$treatment)
df$y <- y
df$trials <- trials
# Inspect simulated data
ggplot(df, aes(x = species, y = y, fill = treatment)) +
geom_boxplot() +
theme_bw()
# Specify model formula
form <- bf(
y | trials(trials) ~ species*treatment,
family = binomial()
)
# Specify priors
priors <- c(
set_prior("normal(0, 1.5)", class = "Intercept"),
set_prior("normal(0, 1.5)", class = "b")
)
# Sample from priors
fit_prior <- brm(
form,
data = df,
prior = priors,
sample_prior = "only"
)
# Prepare new data for prediction
newdf <- expand.grid(
species = levels(df$species),
treatment = levels(df$treatment),
trials = 1
)
# Plot prior expectations
newdf |>
add_epred_draws(fit_prior) |>
ggplot(aes(x = .epred, color = species)) +
geom_density() +
facet_wrap(vars(treatment), labeller = "label_both") +
xlab("y") +
theme_bw()
Simulated data:
Prior expectations:
Although the prior means are the same for all species and treatments (because that is how I specified the priors), the distributions of prior expectations are quite different. I take it that this is a consequence of how species
1 with treatment
1 is being used as the reference level, with each successive coefficient adding further spread around the prior mean. This might be especially stark here because of the logit link function.
Removing the intercept (y ~ 0 + species*treatment
) can make it so that prior expectations are the same for all species in treatment
1, but the deviations will still persist in treatment
2. I could achieve the desired consistency of prior implications by making the priors for b
really narrow, but this doesn’t seem right considering the meaning of those coefficients.
I think my main issue is that these factor levels are essentially exchangeable indices (?), so I would expect them to yield the same expectations. Does what I’m trying to achieve here make sense? Does anyone have recommendations for how to specify priors for multiple categorical predictors to this end? Or perhaps how to better think about the Intercept
and b
terms in brms
?
System info:
- Operating System: macOS Ventura 13.4
- brms Version: 2.20.4