Integrating emmeans with brms

A recent post sparked discussion on integrating emmeans with brms. Moving it to a new topic so that it can be found later. Thx @rvlenth for bringing this up!

FYI, what emmeans does is creates a grid of factor (and predictor) combinations, and for each node on the grid, the posterior MCMC sample of predictions at that node. Currently, the user can specify which parameter (in this case mu, sigma, or ndt. emmeans() may then further average selected ones of these together to obtain marginal means, back-transform them, etc.

What would be useful for a model like this is provisions to ensure that the grid involves all the predictors in all of the parts of the model; then for each node, the posterior sample of values of ndt + exp(mu.hat + 0.5*sigma.hat^2). I believe it wouldnā€™t be too hard to do this, and I could help with the needed additions to emm_basis.brms(). The code would be similar to existing code for prediction. Itā€™d help to have some understanding more generally across models handled by brms and how manageable is that number. I suppose we have various zero-inflated, hurdle, and truncated cases.

In essence, this is what the posterior_epred method of brms does as it computes the mean predictions for each family (https://github.com/paul-buerkner/brms/blob/master/R/posterior_epred.R).

Soā€¦ it seems like we could do some minor things to make it all work:

  • We need to add an argument or enhance an existing argument to both emmeans-support methods to provide for this. Would it make sense to trap the case dpar = "mean"? However it is specified, Iā€™ll refer to this as ā€œmeanā€ mode, subsequently.
  • In recover_data.brmsfit(), for ā€œmeanā€ mode, we need to collect the terms for all fixed-effect components of the model so that we will obtain a grid of reference levels for all predictors involved in those terms. This would be similar to emmeans:::recover_data.zeroinfl(..., mode = "response")
  • In emm_basis.brmsfit() in ā€œmeanā€ mode, it seems that all we will need to do is set the linfct slot to the identity matrix and post.beta slot to the result of posterior_epred(object, newdata = grid, re_formula = NA). (And then bhat and V to its column means and covariance matrix).

This would enhance whatā€™s possible with brms in the sense that emmeans has all kinds of built-in machinery for computing marginal averages thereof, contrasts, by variables, etc. In the example for posterior_epred(), for example, emmeansā€™s code picks up on the fact that carry is nested in period, so in obtaining marginal averages over carry, it would know to average separately for each period.

Your suggestions sound good. With regard to using posterior_epred, we have to be careful to disable everything except fixed effects though, since brms supports all sorts of additional terms, such as GPs and splines, which are probably another topic to discuss. Right now, it is not easily possible to (de)activate these terms but I am planning on adding such functionality in the future.

Right now, I donā€™t yet feel equipped to make the changes necessary to brms to be honest but would be happy to discuss this in more detail or review PRs in that direction of course.

1 Like

is this what the new dpar = ā€œmeanā€ does?

Could you clearly summarize why one would choose to use the posterior predictive distribution (dpar = ā€œmeanā€) and when to use the default posterior in estimating the marginal means (emmeans)?

I donā€™t know to be honest. There is no general answer. It depends on what you are interested it, that is, if you want to know the emmeans of a specific distributional parameters or of the implied PPD mean.

Perhaps the following ideas could help. If you are interested in a specific process of your data generating distribution that is expressed by a specific distrihutional parameter(s) computer the emmeans for those. If you want an overall summary of the mean predictions, dpar = ā€œmeanā€ could be a good option. For most models the main parameter ā€œmuā€ (default) is already the mean of the ppd so it makes literally no difference in these cases.

2 Likes

So what Iā€™m hearing is that the default of dpar = ā€˜muā€™ is fine for general emmeans used to compare effects to see if they are different.

What about point.est = ā€˜meanā€™ vs ā€˜medianā€™? I see the default in emmeans is ā€˜medianā€™ while ā€˜meanā€™ is used in the brms documentation.

1 Like