I am confused about what emmeans
is averaging over with brms
.
After fitting a generalized linear model with brms
, I can pass the model object to emmeans()
to obtain the posterior for marginal means the outcome on the probability or response scale. Since probabilities are non-linear in the parameters, the distribution of predictors affects these predictions. Accordingly, we need to specify the distribution/grid we want to average over for the marginal means.
To illustrate the source of confusion, let’s simulate some data and fit a beta regression. We’re interested in the marginal means of y
for values of x
, averaging over z
.
library(brms)
library(emmeans)
set.seed(123)
# simulate data
df <- data.frame(y = runif(100),
x = sample(0:1, size = 100, replace = T),
z = as.character(sample(1:4, size = 100, replace = T)))
# fit beta model
mod <- brms::brm(formula = y ~ x + z + x:z,
data = df,
family = Beta(link = "logit", link_phi = "log"),
seed = 123)
Doesn’t include predictors in grid
If we don’t specify anything about the predictor grid, the resulting object does not contain any information about the distribution of the predictor we’re averaging over z
. What distribution is it implicitly assuming?
# Specify nothing about grid
nogrid <- emmeans(object = mod, ~x, epred = TRUE)
nogrid
#> x emmean lower.HPD upper.HPD
#> 0 0.545 0.466 0.622
#> 1 0.459 0.386 0.530
#>
#> Point estimate displayed: median
#> HPD interval probability: 0.95
nogrid@grid
#> x .wgt.
#> 1 0 46
#> 2 1 54
If we explicitly tell emmeans
to weight each cell of the grid by its frequency in the sample, we get identical implied weights and still no mention of z
in the grid/output.
# Specify that expected values should be based on probabilities for each cell
# of the predictor grid, where the cell's contribution is weighted by frequency
weighted <- emmeans(object = mod, ~x, weights = "cells", epred = TRUE)
weighted
#> x emmean lower.HPD upper.HPD
#> 0 0.541 0.460 0.614
#> 1 0.462 0.393 0.535
#>
#> Point estimate displayed: median
#> HPD interval probability: 0.95
weighted@grid
#> x .wgt.
#> 1 0 46
#> 2 1 54
It also does not appear possible to use the at
argument either to fix this, as the below code produces the same issue (although somewhat different estimates).
emmeans(object = mod, ~x, weights = "cells",
at = list(z = unique(df$z)), epred = TRUE)
Expected grid
While none of the above appear to average over z
, I was expecting them to use a predictor grid like the below:
# What I was expecting:
grid_expected <- ref_grid(mod, weights = "cells")
grid_expected@grid
#> x z .wgt.
#> 1 0 1 15
#> 2 1 1 17
#> 3 0 2 13
#> 4 1 2 11
#> 5 0 3 9
#> 6 1 3 10
#> 7 0 4 9
#> 8 1 4 16
Rather than pass the grid I produced with ref_grid
above to emmeans()
, I’d like to understand what is going on with the other code.
Figured I’d ping @Russ_Lenth