Marginal means using emmeans or manual calculation through add_fitted_draws don't agree

I’m trying to obtain marginal means from a model fitted with brms. I know I can do so easily using the emmeans package. But I can also do this by getting the draws manually, and then calculate the averages. The problem is that I’m getting similar means, but rather different variances in the posterior distributions of the marginal means.

I can demonstrate with a simple example, actually using rstanarm for convenience. Here I do the two calculations, and plot them side by side. The emmeans package clearly has wider posteriors, but I don’t understand why.

library(rstanarm)
library(emmeans)
library(tidybayes)
library(dplyr)
library(modelr)

mtcars2 <- mutate(mtcars, cyl = factor(cyl), vs = factor(vs))
m1 <- stan_lmer(qsec ~ vs + (vs | cyl), mtcars2)
summary(m1)

emm1 <- emmeans(m1, ~vs) %>% 
  gather_emmeans_draws()

emm2 <- data_grid(mtcars2, vs, cyl) %>% 
  add_fitted_draws(m1) %>%
  group_by(.draw, vs) %>% 
  summarise(.value = mean(.value))

emms <- bind_rows(emmeans = emm1, manual = emm2, .id = 'method')  

ggplot(emms, aes(.value, vs)) +
  geom_halfeyeh() +
  facet_grid(method ~ .)

  • Operating System: Windows 10
  • brms Version: 2.6.0

If you search for emmeans in the search magnifying glass, there are some other threads that have worked through similar problems.

I’m sorry, I’ve read all there is (all the hits for emmeans), and have found little that (as I can see) is directly solving my issue. I’m still not sure why these are not giving the same result.

I’m following this notebook by Matthew Kay for the manual approach.

I think the difference is that emmeans is giving you the result with random effects coefficients zeroed out, so those are the means conditional on the “typical” car. You should get the same thing if you do this:

emm3 <- data_grid(mtcars2, vs) %>% 
  add_fitted_draws(m1, re_formula = ~ 0)

The other approach is giving you the means for some hypothetical population with an equal number of cars having each possible value of cyl. Quoting from the notebook demonstrating the method:

It is very important to stress that this depends on a population made up of equal proportions of all possible values of B being a meaningful thing to talk about.

1 Like

Thanks a lot Matthew! This does clarify where the difference is coming from, and you’re right that re_formula = ~ 0 eliminates the difference. I guess I had always assumed that emmeans was averaging over the group levels as well.

I guess it is a bit non-intuitive to me that zeroing out the group level effect increases the variance. I’ll need to think a bit about what I actually want to do in my case.

1 Like