Confusion on difference between posterior_epred() and posterior_predict() in a mixed effects modelling context

Hi everyone,

This is my first time posting to this forum but I am hopeful someone here can help me understand the difference between the functions posterior_epred() and posterior_predict() in the context of mixed effects modelling.

These functions are used in various Bayesian R packages (e.g., rstanarm, brms, marginaleffects, brmsmargins) but it is not clear to me how they differ and, most importantly, when to use one versus the other in a mixed effects modelling context.

I tried to get some insights via Twitter, which were helpful, but I find I am still confused about this. (My mind likes to complicate things and it’s entirely possible that this issue is not as complicated as I make it out to be.)

My confusion started after reading the help file for a close relative of these functions, posterior_linpred():Posterior distribution of the (possibly transformed) linear predictor — posterior_linpred.stanreg • rstanarm (mc-stan.org). According to this help file, the purpose of posterior_linpred() is to:

“Extract the posterior draws of the linear predictor, possibly transformed by the inverse-link function. This function is occasionally useful, but it should be used sparingly: inference and model checking should generally be carried out using the posterior predictive distribution (i.e., using posterior_predict).”

Prior to reading the above statement, I thought that posterior_epred() is used to compute the expected value of a response variable on the natural response scale for given predictor values and with all random effects in the model set to 0. Furthermore, I thought that posterior_predict() is used to predict new response values based on known predictor values and specified values of the random effects. But the strong wording about using the posterior_epred() function sparingly stopped me in my tracks.

Now I wonder things like:

  1. Is posterior_epred() a special case of posterior_predict() or is it a totally different animal?

  2. Does posterior_epred() incorporate information not only on WHAT is being estimated but also on HOW uncertainty is computed for the quantity being estimated? In particular, does posterior_epred() estimate a mean response value AND the corresponding uncertainty by completely ignoring random error AND random effects?

  3. Does posterior_predict() incorporate information not only on WHAT is being predicted but also on HOW uncertainty is computed for the quantity being predicted? In particular, does posterior_predict() predict an individual response value AND the corresponding uncertainty by accounting for random error AND random effects (unless we instruct it otherwise)?

  4. If posterior_epred() is a different animal than posterior_predict(), when should we use one versus the other in a mixed effects modelling context?

(Note that I am using terminology like estimation and prediction that I feel familiar with in a Frequentist context - in a Bayesian context, it seems to me that the distinction between the two is lost and everything is referred to as prediction.)

Inspired by Twitter, I went as far as to put together a table which lists several estimation and prediction scenarios I could think of in the context of a relatively simple linear mixed effects model. As a practitioner, I would really love to know whether I should posterior_epred() or posterior_predict() for each of these particular scenarios:

Any answers on the above questions and clarifications on the above table (i.e., what function to use for each task) would be much appreciated.

At this point, I am starting to believe that I shouldn’t even bother with posterior_epred() and should just consider posterior_predict() for every single scenario in my table (based on the help file mentioned above and on some of the responses I received on Twitter). But I would like to understand a bit more why if that is at all possible.

Thank you very much!

Isabella

2 Likes

The distinction between epred and predict is always whether you are talking about the distribution (uncertainty) of individual cases (predict) or the average/expectation (epred). Eg, in the context of a single-level normal regression model, does the distribution include just the uncertainty in the mean/average/expectation (epred) or also the individual-level variation (sigma) and its uncertainty (predict)?

In a mixed effects model, there is the complication that we have both individuals and groups. The choice of epred vs predict only concerns “individuals vs average/expectation”. It has nothing to do with groups or group effects. predict there concerns the distribution of individuals within a group. epred concerns uncertainty in the average of a group.

How you are handling group-level parameters doesn’t involve choice between these functions. That is controlled by the re_formula argument. If you want to know uncertainty in existing-group-A’s mean, then epred with re_formula = NULL (include all estimated random effects estimated from data from group A). If you want to know uncertainty in existing-group-A individuals, then predict with re_formula = NULL (include all estimated random effects estimated from data from group A). If you want to know uncertainty in the average of a new group you dot have any data on, then set the group variables to new values and use epred with re_formula = NULL, which will draw random values for the group parameters and make the distribution wider considering we don’t have any data on the group mean. If you want to know uncertainty in individuals in that new group, predict with a new group value and re_formula = NULL. Again, re_formula is controlling what group parameters data we are using/conditioning on to get group-level uncertainty; epred vs predict is determining whether we want uncertainty in the group average vs individual cases.

If we want uncertainty in the population average, epred with re_formula = NA. This fixes all of the group parameters to 0. predict with re_formula = NA doesn’t make much sense to me. That would describe uncertainty in individuals within a group whose parameters are all exactly zero. That’s not a useful estimate IMO.

At a purely computational level, with a normal likelihood, epred vs predict is just a difference whether sigma and its uncertainty is included in the distribution vs not. Random effects handling doesn’t differ between the two. That’s the re_formula argument.

“predict” has little if anything to do with “new observations” vs “observed data”. Both epred and predict describe post-data, model-implied uncertainty. They are model implied uncertainty in the average (epred) or individual cases (predict). The same applies in frequentist and Bayesian modeling.

4 Likes

Thanks for a great question and an excellent response.

I am curious to know if this is a correct interpretation of your answer, but my take is that it boils down to the scientific discipline. My background is in cognitive psychology, where the average/expectation (epred) matters more (my understanding of the discipline). However, in the field of motor learning and biomechanics, it is more accepted that there are different solutions to accomplish the same goal/task. For example, in alpine ski racing, there is a considerable variation in techniques, even among the best skiers. Therefore, although it is interesting to look at the average/expectation (epred) solution/response, it is (perhaps) even more interesting to talk about the distribution (uncertainty) of individual cases (predict).

I am trying to understand the different philosophies and relate that to my research. Is my interpretation reasonable?

It really comes down to whether you want to compare the groups only in terms of their average tendencies or more completely in terms of the distinctiveness of their distributions overall. For example, in cognitive psychology, I think often conclusions have been overly strong because they have only looked at mean differences in (log)normal variables and ignored the huge overlap in distributions across conditions. Much of the research on gender differences in differential psych has this problem, for example.

Now, sometimes a posterior_predict distribution isn’t really that useful (e.g., for a bernoulli likelihood, the prediction interval is always 0-1 because the response is binary), but for non-categorical outcomes, I think posterior_predict is usually more appropriate for giving a more accurate sense of the size of effects and their consequences in my opionion

2 Likes

Thanks @bwiernik. That makes sense.

My apologies for bumping an older post.

The explanation above is clear and makes sense to me. But, I’m wondering how the distinction manifests in the calculations (Stan code).

Most brms models build up a mu parameter for each row of the data and use that parameter in the specified family/distribution.

Since random effects for groups/subjects can be included in either predict or epred, what is “averaged over” in the case of epred vs predict.

What would a generated quantities block look like for epred vs predict for something like a normal_rng(mu, sigma) model where mu is constructed from X*b and then random effects are added per row of data?

I’m thinking that epred = mu (with link) and predict = the appropriate _rng

1 Like

My theory falls apart with a Binomial model.

In trying to examine the brms code for a beta_binomial_epred, I think I can accomplish it with (mu[i] * Phi) * trials[i].

I was definitely wrong! This does not work.