I’m working on analyzing some experimental ecological data, where some demographic rate (`y`

) is expected to vary by `species`

and experimental `treatment`

. Let’s say that I have some prior knowledge about the range of `y`

that I would like to use to specify weakly informative priors, but that this knowledge is agnostic to `species`

or `treatment`

. As such, I would like to set priors that yield the same (or similar) distribution of prior expectations for any given combination of `species`

and `treatment`

. Perhaps I’m missing something obvious, but I’ve been struggling to achieve this using the R formula syntax in `brms`

.

Here is some R code to generate synthetic data and simulate prior expectations:

```
library(brms)
library(tidybayes)
library(ggplot2)
set.seed(505)
# Specify study design
n <- 1000
n_species <- 8
n_treatment <- 2
# Specify coefficients
Intercept <- 0
b_species <- c(0, rnorm(n_species - 1, 0, 1))
b_treatment <- c(0, rnorm(n_treatment - 1, -0.3, 0.5))
# Simulate data
species <- sample(seq(1:n_species), n, replace = TRUE)
treatment <- sample(seq(1:n_treatment), n, replace = TRUE)
df <- data.frame(species, treatment)
logit_mu <- apply(df, 1, function(x) Intercept + b_species[x[1]] + b_treatment[x[2]])
mu <- inv_logit_scaled(logit_mu)
trials <- 30
y <- rbinom(n, trials, mu)
df$species <- as.factor(df$species)
df$treatment <- as.factor(df$treatment)
df$y <- y
df$trials <- trials
# Inspect simulated data
ggplot(df, aes(x = species, y = y, fill = treatment)) +
geom_boxplot() +
theme_bw()
# Specify model formula
form <- bf(
y | trials(trials) ~ species*treatment,
family = binomial()
)
# Specify priors
priors <- c(
set_prior("normal(0, 1.5)", class = "Intercept"),
set_prior("normal(0, 1.5)", class = "b")
)
# Sample from priors
fit_prior <- brm(
form,
data = df,
prior = priors,
sample_prior = "only"
)
# Prepare new data for prediction
newdf <- expand.grid(
species = levels(df$species),
treatment = levels(df$treatment),
trials = 1
)
# Plot prior expectations
newdf |>
add_epred_draws(fit_prior) |>
ggplot(aes(x = .epred, color = species)) +
geom_density() +
facet_wrap(vars(treatment), labeller = "label_both") +
xlab("y") +
theme_bw()
```

Simulated data:

Prior expectations:

Although the prior *means* are the same for all species and treatments (because that is how I specified the priors), the *distributions* of prior expectations are quite different. I take it that this is a consequence of how `species`

1 with `treatment`

1 is being used as the reference level, with each successive coefficient adding further spread around the prior mean. This might be especially stark here because of the logit link function.

Removing the intercept (`y ~ 0 + species*treatment`

) can make it so that prior expectations are the same for all species in `treatment`

1, but the deviations will still persist in `treatment`

2. I could achieve the desired consistency of prior implications by making the priors for `b`

really narrow, but this doesn’t seem right considering the meaning of those coefficients.

I think my main issue is that these factor levels are essentially exchangeable indices (?), so I would expect them to yield the same expectations. Does what I’m trying to achieve here make sense? Does anyone have recommendations for how to specify priors for multiple categorical predictors to this end? Or perhaps how to better think about the `Intercept`

and `b`

terms in `brms`

?

System info:

- Operating System: macOS Ventura 13.4
- brms Version: 2.20.4