# What grid does passing a brms model to emmeans use when a predictor grid is not provided?

I am confused about what `emmeans` is averaging over with `brms`.

After fitting a generalized linear model with `brms`, I can pass the model object to `emmeans()` to obtain the posterior for marginal means the outcome on the probability or response scale. Since probabilities are non-linear in the parameters, the distribution of predictors affects these predictions. Accordingly, we need to specify the distribution/grid we want to average over for the marginal means.

To illustrate the source of confusion, let’s simulate some data and fit a beta regression. We’re interested in the marginal means of `y` for values of `x`, averaging over `z`.

``````library(brms)
library(emmeans)
set.seed(123)

# simulate data
df <- data.frame(y = runif(100),
x = sample(0:1, size = 100, replace = T),
z = as.character(sample(1:4, size = 100, replace = T)))

# fit beta model
mod <- brms::brm(formula = y ~ x + z + x:z,
data = df,
seed = 123)
``````

# Doesn’t include predictors in grid

If we don’t specify anything about the predictor grid, the resulting object does not contain any information about the distribution of the predictor we’re averaging over `z`. What distribution is it implicitly assuming?

``````# Specify nothing about grid
nogrid <- emmeans(object = mod, ~x, epred = TRUE)
nogrid
#>  x emmean lower.HPD upper.HPD
#>  0  0.545     0.466     0.622
#>  1  0.459     0.386     0.530
#>
#> Point estimate displayed: median
#> HPD interval probability: 0.95

nogrid@grid
#>   x .wgt.
#> 1 0    46
#> 2 1    54
``````

If we explicitly tell `emmeans` to weight each cell of the grid by its frequency in the sample, we get identical implied weights and still no mention of `z` in the grid/output.

``````# Specify that expected values should be based on probabilities for each cell
# of the predictor grid, where the cell's contribution is weighted by frequency
weighted <- emmeans(object = mod, ~x, weights = "cells", epred = TRUE)
weighted
#>  x emmean lower.HPD upper.HPD
#>  0  0.541     0.460     0.614
#>  1  0.462     0.393     0.535
#>
#> Point estimate displayed: median
#> HPD interval probability: 0.95
weighted@grid
#>   x .wgt.
#> 1 0    46
#> 2 1    54
``````

It also does not appear possible to use the `at` argument either to fix this, as the below code produces the same issue (although somewhat different estimates).

``````emmeans(object = mod, ~x, weights = "cells",
at = list(z = unique(df\$z)), epred = TRUE)

``````

# Expected grid

While none of the above appear to average over `z`, I was expecting them to use a predictor grid like the below:

``````# What I was expecting:
grid_expected <- ref_grid(mod, weights = "cells")
grid_expected@grid
#>   x z .wgt.
#> 1 0 1    15
#> 2 1 1    17
#> 3 0 2    13
#> 4 1 2    11
#> 5 0 3     9
#> 6 1 3    10
#> 7 0 4     9
#> 8 1 4    16
``````

Rather than pass the grid I produced with `ref_grid` above to `emmeans()`, I’d like to understand what is going on with the other code.

Figured I’d ping @Russ_Lenth

Honestly, all this stuff is pretty thoroughly documented.

The important things to know relative to this inquiry are:

1. Unless specified otherwise, marginal means are computed with equal weights. The `.wgt`. column in the `@grid` slot contains what might be needed in case the user doesn’t want equal weights. See the help for `emmeans` with attention to the `weights` argument, and the section devoted to weighting. See also vignette excerpts Basics of estimated marginal means and Working with messy data.

2. With generalized linear models, averaging is done on the link scale, then back-transformed. If you want it back-transformed before averaging, use `regrid()` (not to be confused with `ref_grid`) on the reference grid before calling `emmeans()`. See again the documentation for `emmeans` and for `ref_grid` and `regrid`, and the vignette Transformations and link functions in emmeans.

3. By the way, there is also an index to vignette topics to make things easier to find: Index of vignette topics

I think those two points cover the confusion expressed here.

Russ

PS – the call `ref_grid(mod, weights = "cells")` makes no sense. `weights` is an argument to `emmeans`, not to `ref_grid`, so it is ignored in the `ref_grid` call.

Thank you for the thoughtful reply and catching the call to `weights` in `ref_grid()`. I understand what weights are being used.

I realized my misunderstanding was due to how `specs` and `at` relate to each other. While `at` allows the user to choose a set of values for a (sub)set of predictors over which to predict, with each cell of the joint distribution of the unspecified variables that enter the model otherwise equally weighted by default, `specs` is what actually determines the values for which you want to obtain the marginal means - averaging over the predictor distribution for each of the variables not passed to `specs`.

Accordingly, while I was wondering why the summary of the `emmeans` output was not showing the variable `z` when I specified `at = list(z = unique(df\$z))`, my estimates were changing because this meant that I obtained an link-scale prediction for each value of `unique(df\$z)` for each combination of the other variables in the model, those values were averaged, and then the prediction was returned on the response scale for each value of `x`. If I wanted to get `epred` values for the joint distribution of `x` and `unique(df\$z)`, I should have used `~ x + z` or `c("x","z")`.

I tried obtaining the reference grid with and without specifying the `z` levels:

``````> ref_grid(mod)
'emmGrid' object with variables:
x = 0, 1
z = 1, 2, 3, 4
Transformation: “logit”

> ref_grid(mod, at = list(z = unique(df\$z)))
'emmGrid' object with variables:
x = 0, 1
z = 2, 4, 1, 3
Transformation: “logit”
``````

That is, it makes no difference, except for the order of levels of z. I also have:

``````> (emm <- emmeans(mod, "x"))
NOTE: Results may be misleading due to involvement in interactions
x emmean lower.HPD upper.HPD
0  0.181    -0.128     0.535
1 -0.170    -0.472     0.143

Point estimate displayed: median
Results are given on the logit (not the response) scale.
HPD interval probability: 0.95
``````

… and the same result with adding `at = list(z = unique(df\$z))`. But what I also notice is that there is no message saying that the results were averaged over z, which is an annotation shown when we do a frequentist summary:

``````> emmeans(mod, "x", frequentist = TRUE)
NOTE: Results may be misleading due to involvement in interactions
x emmean    SE  df asymp.LCL asymp.UCL
0  0.183 0.170 Inf    -0.150     0.517
1 -0.171 0.157 Inf    -0.478     0.137

Results are averaged over the levels of: z
Results are given on the logit (not the response) scale.
Confidence level used: 0.95
``````

This appears to be an oversight in the `hpd.summary` function. I will look at it and fix it, as it can create confusion if we don’t say what variables were averaged over. And the information is there:

``````> emm@misc\$avgd.over
 "z"
``````
1 Like