Expected Value of Posterior vs. Posterior of Expected Value with epred

rwilcom · August 10, 2022, 8:01pm

The brms documentation states that posterior_epred.brmsfit() returns “Expected Values of the Posterior Predictive Distribution.” My understanding is that this means for the model y \sim \alpha + \beta X + \epsilon, it would return the posterior for \alpha + \beta X for a given X for which to predict.

However, isn’t this actually showing us the posterior of the expected value and not the expected value of the posterior?

scholz · August 11, 2022, 7:40am

I guess Mcelreath can explain that a lot better than I can (Statistical Rethinking 2022 Lecture 02 - Bayesian Inference - YouTube) or chapter 3 in rethinking.

For posterior predict you draw a random sample from the respective rng of the likelihood based on the parameter samples in your posterior.
For posterior epred you calculate the mean of your likelihood for each of those posterior samples.
Eg.

posterior_predict_kumaraswamy <- function(i, prep, ...) {
  mu <- brms::get_dpar(prep, "mu", i = i) # mu for the kumaraswamy is the median
  p <- brms::get_dpar(prep, "p", i = i)
  return(rkumaraswamy(prep$ndraws, mu, p))
}

posterior_epred_kumaraswamy <- function(prep) {
  mu <- brms::get_dpar(prep, "mu") # mu for the kumaraswamy is the median
  p <- brms::get_dpar(prep, "p")
  q <- -(log(2) / log1p(-mu^p))
  return(q * beta((1 + 1 / p), q))
}

So for predict you get draws from the predictive distribution while for epred you get draws of the mean of the ppd.

Hope this helps.

JimBob · August 11, 2022, 9:15am

@scholz is it broadly accurate to say that with posterior epred, each draw represents an average response, whereas for posterior_predict each draw is more like a randomly drawn individual response?

jsocolar · August 11, 2022, 12:57pm

You’re right, but I don’t think the brms doc is necessarily wrong.

In your thinking, the posterior predictive distribution is a single distribution with a single expectation. Under this logic, the expectation is not something that would be computed draw-wise.

In the logic of the docs, there is a posterior predictive distribution that is conditional on the parameters, so that for each MCMC draw we can talk about the (draw-wise conditional) posterior predictive distribution. We can find that thing’s expectation, and we can reasonably call it “the expectation of the (draw-wise conditional) posterior predictive distribution”. Thus we get a posterior distribution of the expectation of the (draw-wise conditional) posterior predictive distribution.

If we leave off the last part of that final sentence, we get what you say: the posterior distribution of the expectation. If we leave off the first part, we get what brms says: the expectation of the posterior predictive distribution.

FWIW, I agree with you that your language seems more in line with how I would understand “expected value of the posterior predictive distribution” out of context.

avehtari · August 11, 2022, 5:17pm

You have dropped one s. The original message has “values” which indicates many values.

rwilcom · August 11, 2022, 5:58pm

This is a nice clarification, but then shouldn’t “a posterior distribution of the expectation of the (draw-wise conditional) posterior predictive distribution." instead reference “expectation s” (plural)? The confusing part about the brms documentation is then that it is it does seem to imply there is a single posterior predictive for each draw.

@scholz Just looked and Rethinking explains what the posterior predictive is more clearly (but does not discuss the expected value of the distribution vs. the distribution of the expected values issue.

@avehtari The discussion in Regression and Other Stories describes epred (when passed new data) as “the expected prediction for a new data point” (p.116) and says "we can compute uncertainty in the expected value or predicted probability E(y_{new}) = logit^{-1}(x_{new} \beta) " (p.223) which seems to imply the posterior of the expected value. Similarly, the same type of description is implied on p.315.

avehtari · August 11, 2022, 6:29pm

When writing this part, we coordinated with brms (Paul) and rstanarm (BenG) about the name epred and explanations. If you think ROS, brms, or rstanarm could have some more clear sentence, please suggest (English is not my first language so I may parse sentences differently and thus possibly don’t see if some sentence is misleading)

jsocolar · August 11, 2022, 8:19pm

Good point though to be fair this could refer to multiple values for multiple datapoints (i.e. rows).

paul.buerkner · August 11, 2022, 10:18pm

Indeed, the idea behind “Expected ValueS of the Posterior Predictive Distribution” is that for different predictors, we get different expected values. Also, for each posterior sample, we technically get a different expected value. So, in my mind, the plural make sense from different perspectives. That said, as also Aki stated, we are happy for suggestions on how this can be made more clear, because I see a lot of people struggeling with these concepts at some point.

rwilcom · August 12, 2022, 11:04am

Yes, I see how the name conveys that as well. The issue seems to be that it could reasonably mean different things . Looking around at discussion of the use of “epred” functions in brms, it looks like people don’t really seem to know exactly what this captures outside of “excluding residual error” (and understanding it will have a lower-posterior variance) or understand but vary in how they communicate it.

What would probably help would be including a brief description of how the two are distinguished either in terms of the formalized model or an informal description of steps to obtain the estimate “manually.” Linking to/copying some of the explanation from the Stan User’s Guide could help, provided that it describes what is actually being done (although I find it somewhat long-winded) 27.3 Sampling from the posterior predictive distribution | Stan User’s Guide

bwiernik · August 17, 2022, 12:31pm

I think that the ROS language quoted above is very clear. I think the title of the brms documentation is rather awkward from a typical English usage perspective. I think it’s useful to emphasize what uncertainty is being characterized in the samples. For epred, it’s uncertainty in the expected value, so I think titling the page “Posterior Distribution of Expected Values” would be clearer

isabellaghement · September 11, 2022, 1:46am

Dr. Brenton Wiernik pointed me to this thread after seeing me splash my confusion all over Twitter about what it is exactly that posterior_epred() does in comparison to posterior_predict() and why the rstanarm documentation states (indirectly) that posterior_epred() should be used sparingly.

In case the questions I posted earlier today on this same forum may prove helpful to you in improving the brms and rstanarm documentation, I am linking to them here:

Confusion on difference between posterior_epred() and posterior_predict() in a mixed effects modelling context - Modeling - The Stan Forums (mc-stan.org)

People like myself, coming from a predominantly Frequentist background, tend to struggle with the nuances of language and interpretation in the Bayesian R packages help files. At the end of the day, we may not be able to understand all the tricky nuances, but what we do want is to know (at least broadly) which function to apply in which practical setting.

If you have a chance to look at my questions, you will see that I created a table of potential use cases for a relatively simple mixed effects models and took a stab at guessing whether posterior_epred() or posterior_predict() should be applied to each of those cases. I am not sure whether I am on the right track with that, but having a package vignette or commented examples in the help files for brms or rstanarm along those lines would be invaluable. That would at least give people a good starting point and then, if they want to tease out the deeper nuances, they could do so later on. (I do some teaching on top of my consulting and I know that being able to build on a solid foundation will make learning something new a lot easier than building it on shaky ground.)

Here is the table in case you won’t have a chance to read the questions I linked to:

Though I am probably not the best person to help with any documentation updates (a genuine Bayesian would be a much better fit), I would be happy to help in any small way I can, such as suggesting examples to be included in the documentation/vignette (along the lines devised above), taking a tentative stab at the R syntax and having a more experienced person confirm what looks right and what looks wrong, etc.

I would love it if there was a place for tables like the one above somewhere in the documentation not just for this model but for other, more complex models (e.g., ordbetareg, which interests me directly).

Thank you!

Isabella

paul.buerkner · September 11, 2022, 6:54pm

I really like the table you are providing. Would you mind opening an issue on GitHub - paul-buerkner/brms: brms R package for Bayesian generalized multivariate non-linear multilevel models using Stan pointing to this thread so I don’t forget adding such examples and explanation to the brms docu of posterior_predict and posterior_epred?

isabellaghement · September 11, 2022, 10:54pm

Hi Paul,

Thank you for your kind words - they mean more than you can imagine! As per your request, I added a new issue here:

brms R package for Bayesian generalized multivariate non-linear multilevel models using Stan · Issue #1408 · paul-buerkner/brms (github.com)

Hopefully, this is what you need. If not, please just let me know.

Thank you!

Isabella

bjg · September 15, 2022, 8:16pm

Note that the table as-is needs a little work since it’s still not clear from the table what is included or excluded in the predictions. This is why the response to the table noted in that same thread focuses on describing treatment of random effects first, then thinks about substantive use cases Confusion on difference between posterior_epred() and posterior_predict() in a mixed effects modelling context - #2 by bwiernik

The table in “which average is best?” at this post also discusses this A guide to correctly calculating posterior predictions and average marginal effects with multilievel Bayesian models | Andrew Heiss

I do think that providing the model predictions are drawn from in the documentation is probably the best way to remove ambiguity.

Topic		Replies	Views
How to compute expected value of the posterior predictive distribution (epred) Modeling specification	17	1199	July 31, 2023
Confusion on difference between posterior_epred() and posterior_predict() in a mixed effects modelling context Modeling mixed-model	6	3586	July 27, 2023
Example of manually calculating posterior predictions and "epredictions" with posterior draws from brms model? brms brms	2	721	November 8, 2022
Prior predictive distribution as in posterior_predict/epred rstanarm prior-predictive	4	974	February 19, 2021
Computing a 95% prediction interval from brms model brms brms	3	703	November 18, 2023

Expected Value of Posterior vs. Posterior of Expected Value with epred

Related topics