A recent post sparked discussion on integrating `emmeans`

with `brms`

. Moving it to a new topic so that it can be found later. Thx @rvlenth for bringing this up!

FYI, what `emmeans`

does is creates a grid of factor (and predictor) combinations, and for each node on the grid, the posterior MCMC sample of predictions at that node. Currently, the user can specify which parameter (in this case mu, sigma, or ndt. `emmeans()`

may then further average selected ones of these together to obtain marginal means, back-transform them, etc.

What would be useful for a model like this is provisions to ensure that the grid involves all the predictors in all of the parts of the model; then for each node, the posterior sample of values of `ndt + exp(mu.hat + 0.5*sigma.hat^2)`

. I believe it wouldnāt be too hard to do this, and I could help with the needed additions to `emm_basis.brms()`

. The code would be similar to existing code for prediction. Itād help to have some understanding more generally across models handled by **brms** and how manageable is that number. I suppose we have various zero-inflated, hurdle, and truncated cases.

In essence, this is what the `posterior_epred`

method of brms does as it computes the mean predictions for each family (https://github.com/paul-buerkner/brms/blob/master/R/posterior_epred.R).

Soā¦ it seems like we could do some minor things to make it all work:

- We need to add an argument or enhance an existing argument to both emmeans-support methods to provide for this. Would it make sense to trap the case
`dpar = "mean"`

? However it is specified, Iāll refer to this as āmeanā mode, subsequently. - In
`recover_data.brmsfit()`

, for āmeanā mode, we need to collect the terms for all fixed-effect components of the model so that we will obtain a grid of reference levels for all predictors involved in those terms. This would be similar to`emmeans:::recover_data.zeroinfl(..., mode = "response")`

- In
`emm_basis.brmsfit()`

in āmeanā mode, it seems that all we will need to do is set the`linfct`

slot to the identity matrix and`post.beta`

slot to the result of`posterior_epred(object, newdata = grid, re_formula = NA)`

. (And then`bhat`

and`V`

to its column means and covariance matrix).

This would enhance whatās possible with **brms** in the sense that **emmeans** has all kinds of built-in machinery for computing marginal averages thereof, contrasts, `by`

variables, etc. In the example for `posterior_epred()`

, for example, **emmeans**ās code picks up on the fact that `carry`

is nested in `period`

, so in obtaining marginal averages over `carry`

, it would know to average separately for each `period`

.

Your suggestions sound good. With regard to using `posterior_epred`

, we have to be careful to disable everything except fixed effects though, since brms supports all sorts of additional terms, such as GPs and splines, which are probably another topic to discuss. Right now, it is not easily possible to (de)activate these terms but I am planning on adding such functionality in the future.

Right now, I donāt yet feel equipped to make the changes necessary to brms to be honest but would be happy to discuss this in more detail or review PRs in that direction of course.

is this what the new dpar = āmeanā does?

Could you clearly summarize why one would choose to use the posterior predictive distribution (dpar = āmeanā) and when to use the default posterior in estimating the marginal means (emmeans)?

I donāt know to be honest. There is no general answer. It depends on what you are interested it, that is, if you want to know the emmeans of a specific distributional parameters or of the implied PPD mean.

Perhaps the following ideas could help. If you are interested in a specific process of your data generating distribution that is expressed by a specific distrihutional parameter(s) computer the emmeans for those. If you want an overall summary of the mean predictions, dpar = āmeanā could be a good option. For most models the main parameter āmuā (default) is already the mean of the ppd so it makes literally no difference in these cases.

So what Iām hearing is that the default of dpar = āmuā is fine for general emmeans used to compare effects to see if they are different.

What about point.est = āmeanā vs āmedianā? I see the default in emmeans is āmedianā while āmeanā is used in the brms documentation.