I am confused about what emmeans is averaging over with brms.
After fitting a generalized linear model with brms, I can pass the model object to emmeans() to obtain the posterior for marginal means the outcome on the probability or response scale. Since probabilities are non-linear in the parameters, the distribution of predictors affects these predictions. Accordingly, we need to specify the distribution/grid we want to average over for the marginal means.
To illustrate the source of confusion, let’s simulate some data and fit a beta regression. We’re interested in the marginal means of y for values of x, averaging over z.
library(brms)
library(emmeans)
set.seed(123)
# simulate data
df <- data.frame(y = runif(100),
x = sample(0:1, size = 100, replace = T),
z = as.character(sample(1:4, size = 100, replace = T)))
# fit beta model
mod <- brms::brm(formula = y ~ x + z + x:z,
data = df,
family = Beta(link = "logit", link_phi = "log"),
seed = 123)
Doesn’t include predictors in grid
If we don’t specify anything about the predictor grid, the resulting object does not contain any information about the distribution of the predictor we’re averaging over z. What distribution is it implicitly assuming?
# Specify nothing about grid
nogrid <- emmeans(object = mod, ~x, epred = TRUE)
nogrid
#> x emmean lower.HPD upper.HPD
#> 0 0.545 0.466 0.622
#> 1 0.459 0.386 0.530
#>
#> Point estimate displayed: median
#> HPD interval probability: 0.95
nogrid@grid
#> x .wgt.
#> 1 0 46
#> 2 1 54
If we explicitly tell emmeans to weight each cell of the grid by its frequency in the sample, we get identical implied weights and still no mention of z in the grid/output.
# Specify that expected values should be based on probabilities for each cell
# of the predictor grid, where the cell's contribution is weighted by frequency
weighted <- emmeans(object = mod, ~x, weights = "cells", epred = TRUE)
weighted
#> x emmean lower.HPD upper.HPD
#> 0 0.541 0.460 0.614
#> 1 0.462 0.393 0.535
#>
#> Point estimate displayed: median
#> HPD interval probability: 0.95
weighted@grid
#> x .wgt.
#> 1 0 46
#> 2 1 54
It also does not appear possible to use the at argument either to fix this, as the below code produces the same issue (although somewhat different estimates).
emmeans(object = mod, ~x, weights = "cells",
at = list(z = unique(df$z)), epred = TRUE)
Expected grid
While none of the above appear to average over z, I was expecting them to use a predictor grid like the below:
# What I was expecting:
grid_expected <- ref_grid(mod, weights = "cells")
grid_expected@grid
#> x z .wgt.
#> 1 0 1 15
#> 2 1 1 17
#> 3 0 2 13
#> 4 1 2 11
#> 5 0 3 9
#> 6 1 3 10
#> 7 0 4 9
#> 8 1 4 16
Rather than pass the grid I produced with ref_grid above to emmeans(), I’d like to understand what is going on with the other code.
Figured I’d ping @Russ_Lenth