Expected Value of Posterior vs. Posterior of Expected Value with epred

The brms documentation states that posterior_epred.brmsfit() returns “Expected Values of the Posterior Predictive Distribution.” My understanding is that this means for the model y \sim \alpha + \beta X + \epsilon, it would return the posterior for \alpha + \beta X for a given X for which to predict.

However, isn’t this actually showing us the posterior of the expected value and not the expected value of the posterior?

1 Like

I guess Mcelreath can explain that a lot better than I can (Statistical Rethinking 2022 Lecture 02 - Bayesian Inference - YouTube) or chapter 3 in rethinking.

For posterior predict you draw a random sample from the respective rng of the likelihood based on the parameter samples in your posterior.
For posterior epred you calculate the mean of your likelihood for each of those posterior samples.
Eg.

posterior_predict_kumaraswamy <- function(i, prep, ...) {
  mu <- brms::get_dpar(prep, "mu", i = i) # mu for the kumaraswamy is the median
  p <- brms::get_dpar(prep, "p", i = i)
  return(rkumaraswamy(prep$ndraws, mu, p))
}

posterior_epred_kumaraswamy <- function(prep) {
  mu <- brms::get_dpar(prep, "mu") # mu for the kumaraswamy is the median
  p <- brms::get_dpar(prep, "p")
  q <- -(log(2) / log1p(-mu^p))
  return(q * beta((1 + 1 / p), q))
}

So for predict you get draws from the predictive distribution while for epred you get draws of the mean of the ppd.

Hope this helps.

1 Like

@scholz is it broadly accurate to say that with posterior epred, each draw represents an average response, whereas for posterior_predict each draw is more like a randomly drawn individual response?

You’re right, but I don’t think the brms doc is necessarily wrong.

In your thinking, the posterior predictive distribution is a single distribution with a single expectation. Under this logic, the expectation is not something that would be computed draw-wise.

In the logic of the docs, there is a posterior predictive distribution that is conditional on the parameters, so that for each MCMC draw we can talk about the (draw-wise conditional) posterior predictive distribution. We can find that thing’s expectation, and we can reasonably call it “the expectation of the (draw-wise conditional) posterior predictive distribution”. Thus we get a posterior distribution of the expectation of the (draw-wise conditional) posterior predictive distribution.

If we leave off the last part of that final sentence, we get what you say: the posterior distribution of the expectation. If we leave off the first part, we get what brms says: the expectation of the posterior predictive distribution.

FWIW, I agree with you that your language seems more in line with how I would understand “expected value of the posterior predictive distribution” out of context.

1 Like

You have dropped one s. The original message has “values” which indicates many values.

1 Like

This is a nice clarification, but then shouldn’t “a posterior distribution of the expectation of the (draw-wise conditional) posterior predictive distribution." instead reference “expectation s” (plural)? The confusing part about the brms documentation is then that it is it does seem to imply there is a single posterior predictive for each draw.

@scholz Just looked and Rethinking explains what the posterior predictive is more clearly (but does not discuss the expected value of the distribution vs. the distribution of the expected values issue.

@avehtari The discussion in Regression and Other Stories describes epred (when passed new data) as “the expected prediction for a new data point” (p.116) and says "we can compute uncertainty in the expected value or predicted probability E(y_{new}) = logit^{-1}(x_{new} \beta) " (p.223) which seems to imply the posterior of the expected value. Similarly, the same type of description is implied on p.315.

When writing this part, we coordinated with brms (Paul) and rstanarm (BenG) about the name epred and explanations. If you think ROS, brms, or rstanarm could have some more clear sentence, please suggest (English is not my first language so I may parse sentences differently and thus possibly don’t see if some sentence is misleading)

2 Likes

Good point though to be fair this could refer to multiple values for multiple datapoints (i.e. rows).

Indeed, the idea behind “Expected ValueS of the Posterior Predictive Distribution” is that for different predictors, we get different expected values. Also, for each posterior sample, we technically get a different expected value. So, in my mind, the plural make sense from different perspectives. That said, as also Aki stated, we are happy for suggestions on how this can be made more clear, because I see a lot of people struggeling with these concepts at some point.

Yes, I see how the name conveys that as well. The issue seems to be that it could reasonably mean different things . Looking around at discussion of the use of “epred” functions in brms, it looks like people don’t really seem to know exactly what this captures outside of “excluding residual error” (and understanding it will have a lower-posterior variance) or understand but vary in how they communicate it.

What would probably help would be including a brief description of how the two are distinguished either in terms of the formalized model or an informal description of steps to obtain the estimate “manually.” Linking to/copying some of the explanation from the Stan User’s Guide could help, provided that it describes what is actually being done (although I find it somewhat long-winded) 27.3 Sampling from the posterior predictive distribution | Stan User’s Guide

2 Likes

I think that the ROS language quoted above is very clear. I think the title of the brms documentation is rather awkward from a typical English usage perspective. I think it’s useful to emphasize what uncertainty is being characterized in the samples. For epred, it’s uncertainty in the expected value, so I think titling the page “Posterior Distribution of Expected Values” would be clearer

3 Likes

Dr. Brenton Wiernik pointed me to this thread after seeing me splash my confusion all over Twitter about what it is exactly that posterior_epred() does in comparison to posterior_predict() and why the rstanarm documentation states (indirectly) that posterior_epred() should be used sparingly.

In case the questions I posted earlier today on this same forum may prove helpful to you in improving the brms and rstanarm documentation, I am linking to them here:

Confusion on difference between posterior_epred() and posterior_predict() in a mixed effects modelling context - Modeling - The Stan Forums (mc-stan.org)

People like myself, coming from a predominantly Frequentist background, tend to struggle with the nuances of language and interpretation in the Bayesian R packages help files. At the end of the day, we may not be able to understand all the tricky nuances, but what we do want is to know (at least broadly) which function to apply in which practical setting.

If you have a chance to look at my questions, you will see that I created a table of potential use cases for a relatively simple mixed effects models and took a stab at guessing whether posterior_epred() or posterior_predict() should be applied to each of those cases. I am not sure whether I am on the right track with that, but having a package vignette or commented examples in the help files for brms or rstanarm along those lines would be invaluable. That would at least give people a good starting point and then, if they want to tease out the deeper nuances, they could do so later on. (I do some teaching on top of my consulting and I know that being able to build on a solid foundation will make learning something new a lot easier than building it on shaky ground.)

Here is the table in case you won’t have a chance to read the questions I linked to:

Though I am probably not the best person to help with any documentation updates (a genuine Bayesian would be a much better fit), I would be happy to help in any small way I can, such as suggesting examples to be included in the documentation/vignette (along the lines devised above), taking a tentative stab at the R syntax and having a more experienced person confirm what looks right and what looks wrong, etc.

I would love it if there was a place for tables like the one above somewhere in the documentation not just for this model but for other, more complex models (e.g., ordbetareg, which interests me directly).

Thank you!

Isabella

6 Likes

I really like the table you are providing. Would you mind opening an issue on GitHub - paul-buerkner/brms: brms R package for Bayesian generalized multivariate non-linear multilevel models using Stan pointing to this thread so I don’t forget adding such examples and explanation to the brms docu of posterior_predict and posterior_epred?

3 Likes

Hi Paul,

Thank you for your kind words - they mean more than you can imagine! As per your request, I added a new issue here:

brms R package for Bayesian generalized multivariate non-linear multilevel models using Stan · Issue #1408 · paul-buerkner/brms (github.com)

Hopefully, this is what you need. If not, please just let me know.

Thank you!

Isabella

2 Likes

Note that the table as-is needs a little work since it’s still not clear from the table what is included or excluded in the predictions. This is why the response to the table noted in that same thread focuses on describing treatment of random effects first, then thinks about substantive use cases Confusion on difference between posterior_epred() and posterior_predict() in a mixed effects modelling context - #2 by bwiernik

The table in “which average is best?” at this post also discusses this A guide to correctly calculating posterior predictions and average marginal effects with multilievel Bayesian models | Andrew Heiss

I do think that providing the model predictions are drawn from in the documentation is probably the best way to remove ambiguity.

1 Like