# How to compute expected value of the posterior predictive distribution (epred)

Rather than cluttering up other threads, I thought it might be helpful to start a topic for this discussion specifically.

In trying to understand what expectations of predictions represent, I’ve been trying to compute them in generated quantities, with little success.

I am currently working on a beta binomial model. I used brms first to generate the Stan code then modified for my use case

In looking through github, I found this:

posterior_epred_beta_binomial <- function(prep) {
# beta part included in mu
trials <- data2draws(prep$data$trials, dim_mu(prep))
prep$dpars$mu * trials
}



versus posterior_predict_beta_binomial:

posterior_predict_beta_binomial <- function(i, prep, ntrys = 5, ...) {
rdiscrete(
n = prep$ndraws, dist = "beta_binomial", size = prep$data$trials[i], mu = get_dpar(prep, "mu", i = i), phi = get_dpar(prep, "phi", i = i), lb = prep$data$lb[i], ub = prep$data$ub[i], ntrys = ntrys ) }  So I attempted to add that to the generated quantities block: functions { /* compute correlated group-level effects * Args: * z: matrix of unscaled group-level effects * SD: vector of standard deviation parameters * L: cholesky factor correlation matrix * Returns: * matrix of scaled group-level effects */ matrix scale_r_cor(matrix z, vector SD, matrix L) { // r is stored in another dimension order than z return transpose(diag_pre_multiply(SD, L) * z); } } data { int<lower=1> N; // number of observations array[N] int Y; // response array[N] int nTrials; // number of trials int<lower=1> K; // number of coefficients matrix[N, K] X; // design matrix // data for group-level effects subjects int<lower=1> nSubj; // number of subjects int<lower=1> mSubj; // number of coefficients per level array[N] int<lower=1> Subj; // subject index // subject-level noise-level values vector[N] NL_0; vector[N] NL_1; vector[N] NL_2; vector[N] NL_3; int<lower=1> nCor; // number of subject-level correlations } parameters { vector[K] b; // regression coefficients real<lower=0> phi; // precision parameter vector<lower=0>[mSubj] subj_sd; // subject-level standard deviations matrix[mSubj, nSubj] subj_z; // standardized subject-level effects cholesky_factor_corr[mSubj] L; // cholesky factor of correlation matrix } transformed parameters { matrix[nSubj, mSubj] r_subj; // actual subject-level noise effects vector[nSubj] r_0; vector[nSubj] r_1; vector[nSubj] r_2; vector[nSubj] r_3; // compute actual subject-level effects r_subj = scale_r_cor(subj_z, subj_sd, L); r_0 = r_subj[ : , 1]; r_1 = r_subj[ : , 2]; r_2 = r_subj[ : , 3]; r_3 = r_subj[ : , 4]; // initialize linear predictor term vector[N] mu = rep_vector(0.0, N); mu += X * b; // add more terms to the linear predictor for (n in 1 : N) { mu[n] += r_0[Subj[n]] * NL_0[n] + r_1[Subj[n]] * NL_1[n] + r_2[Subj[n]] * NL_2[n] + r_3[Subj[n]] * NL_3[n]; } mu = inv_logit(mu); } model { real lprior = 0; // prior contributions to the log posterior // priors lprior += normal_lupdf(b | 0, 1); lprior += gamma_lupdf(phi | 0.01, 0.01); lprior += normal_lupdf(subj_sd | 0, 1); lprior += lkj_corr_cholesky_lupdf(L | 1); target += lprior; target += std_normal_lupdf(to_vector(subj_z)); // likelihood for (n in 1 : N) { target += beta_binomial_lupmf(Y[n] | nTrials[n], mu[n] * phi, (1 - mu[n]) * phi); } } generated quantities { vector[N] ypred ; vector[N] epred ; vector[N] log_lik ; // compute subject-level correlations corr_matrix[mSubj] Cor = multiply_lower_tri_self_transpose(L); vector<lower=-1, upper=1>[nCor] cor; // extract upper diagonal of correlation matrix for (k in 1 : mSubj) { for (j in 1 : (k - 1)) { cor[choose(k - 1, 2) + j] = Cor[j, k]; } } for (i in 1:N){ epred[i] = mu[i] * nTrials[i] ; } for (i in 1:N){ ypred[i] = beta_binomial_rng(nTrials[i], mu[i] * phi, (1 - mu[i]) * phi); } for (i in 1:N){ log_lik[i] = beta_binomial_lpmf(Y[i] | nTrials[i], mu[i] * phi, (1 - mu[i]) * phi); } }  The samples of ‘epred’ are nothing like what I get by using posterior_epred from brms. brms is giving me epreds in the 2000-3000 range, my generated quantities are in the 0 to 2 range. I’m clearly missing something, but I’m not sure what. Can anyone assist with how I can calculate comparable epreds? If I take draws of mu, draws <- sample(c(1:4000), size = 250) mu <- as_draws_matrix(subset_draws(Fit$draws()
, variable = "mu"
, draw = draws))



combine them with my input data,

Pred_Frame <- cbind(Data,
as_tibble(t(mu))) %>%
pivot_longer(1:last_col(), names_to = ".draw", values_to = "mu") %>%
mutate(Prob = as.numeric(mu)
, .draw = as.numeric(.draw))



then take the mean of mu for each group and each sample,

Epred_Frame <- Pred_Frame %>%
group_by(Group, Condition,  .draw) %>%
summarize(epred = mean(mu))



I seem to get pretty close.

Is that the correct interpretation/implementation?

I’d recommend this: 27 Posterior Predictive Sampling | Stan User’s Guide

The short answer is that to compute some expectation relative to the posterior predictive distribution, p(\tilde{y} \mid y), where \tilde{y} is “test” data and y is “training” data, goes as follows

\mathbb{E}[f(\tilde{y}) \mid y] \approx \frac{1}{M} \sum_{m=1}^M f(\tilde{y}^{(m)}),

where \tilde{y}^{(m)} \sim p(\tilde{y} \mid y). It’s important to include sampling uncertainty as well as estimation uncertainty for sampling \tilde{y}^{(m)}. For instance, if you have a beta-binomial model and have posterior draws \theta^{(m)} \sim p(\theta \mid y) for model parameters \theta, then you want to simulate \tilde{y} \sim \textrm{beta-binomial}(N, \theta) rather than just using the expectation N \cdot \theta.

Thanks!

The manual gives this example:

data {
int<lower=0> N;
array[N] int<lower=0> y;
}
parameters {
real<lower=0> lambda;
}
model {
lambda ~ gamma(1, 1);
y ~ poisson(lambda);
}
generated quantities {
int<lower=0> y_tilde = poisson_rng(lambda);
}


The expectations are computed as:

generated quantities {
real f_val = f(y_tilde, theta);
// ...
}


Am I correct that I would use

vector[N] epred;

epred  += beta_binomial(y_tilde, mu * phi, (1 -  mu) * phi);



Is that what brms does for posterior_epred()?

That seems to conflict with the documentation saying it doesn’t include the variance/dispersion parameters.

I tried this with a student-t model and all epred samples were nan:

  vector[N] ypred;
vector[N] log_lik;
vector[N] epred;

for(i in 1:N){
ypred[i] = student_t_rng(nu, mu[i], sigma[i] ) ;
}

epred += student_t_lpdf(ypred | nu_unscaled, mu_unscaled, sigma_unscaled);

for(i in 1:N){
log_lik[i] = student_t_lpdf(Y[i] | nu, mu[i], sigma[i]); ;
}



Still trying to dig further, I fit a student model with brms and called add_epred_draws() from tidybayes.

The .epred values are different for every subject in the same conditions, so there is still individual variability represented by .epred.

When I fit the same model with cmdstanr and then take draws of mu, there are very different from the values of .epred.

I also tried

for(i in 1:N){
epred[i] = student_t_lpdf(ypred[i] | nu, mu[i], sigma[i]); ;
}


Those values are nowhere close to what I get from add_epred_draws, either.

This is what brms has for epred from the student family:

posterior_epred_student <- function(prep) {
if (!is.null(prep$ac$lagsar)) {
prep$dpars$mu <- posterior_epred_lagsar(prep)
}
prep$dpars$mu
}


I think @andrewheiss summarizes it nicely here:

1 Like
1 Like

This was extremely helpful! Thank you!

What I’d like to do is recreate posterior_epred manually, which you show nicely for the gaussian case.

What would “the average of each draw from posterior_predict” look like?

Posterior_predict is easy to do in generated quantities using *_rng functions. But what would I average over to get epred?

For a Student-t example, if I have

generated quantities {

vector[N] ypred;

for(i in 1:N){
ypred[i] = student_t_rng(nu, mu[i], sigma[i] ) ;
}

}



That will give me posterior predictions (4000 for each observation in the data (i). What do I average over?

posterior_epred from a brms fit will give 4000 epred’s for each observation.

See this notebook I made when working through Bayes Rules! chapter 9, where I make their example model in rstanarm, brms, and raw Stan: bayesf22 Notebook - 9: Simple normal regression

In the Stan code, I use *_rng to create posterior predict, like you said:

generated quantities {
vector[n] Y_rep;

for (i in 1:n) {
Y_rep[i] = normal_rng(mu[i], sigma);
}
}


If you go down to the “Posterior Prediction” section (bayesf22 Notebook - 9: Simple normal regression) and click on the Stan tab, you can see how to get epreds from those

2 Likes

Thank you very much!!!

Would the equivalent to brms epred be the mean of predictions for each row/observation?

I’m still confused by why there are multiple epred draws per row from brms

1 Like

In digging around through available functions, I found rstudent_t(n, df, mu = 0, sigma = 1) in brms

So I tried this:

Epred_Frame <- left_join(Data %>%
mutate(row = row_number())
, fit %>%
, ndraws = 100) %>%
mutate(epred = rstudent_t(n(), df = nu, mu = mu, sigma = 1)) %>%
ungroup() %>%
select(row, epred, .draw)
, by = "row")



Which gives what looks like a reasonable approximation. Perhaps it’s the fixing of sigma to 1 that is the difference between .epred and '.predictioninbrms?

Epred is the expectation of the data given the covariates. This expectation is taken draw-wise over the posterior.

Is it correct that posterior_epred is like using posterior_pred and fixing sigma to 1 for the rng?

From the brms code:

rstudent_t <- function(n, df, mu = 0, sigma = 1) {
if (isTRUE(any(sigma < 0))) {
stop2("sigma must be non-negative.")
}
mu + sigma * rt(n, df = df)
}


So it looks like if sigma were set to 1, the result is based on mu (and it’s uncertainty). Is that what happens when posterior_epred() is called?

No, that’s not correct. I’m unsure of what is giving you that impression. posterior_pred gives a draw from the posterior predictive distribution. posterior_epred gives the posterior distribution for the expectation of the response.

I’m trying to calculate it myself, but can’t find anything that comes close.

@andrewheiss 's write up has this for epred:

epred_manual <- model_normal |>
mutate(mu = b_Intercept +
(b_flipper_length_mm *
penguins_avg_flipper\$flipper_length_mm),  # This is posterior_linpred()
y_new = rnorm(n(), mean = mu, sd = sigma))  # This is posterior_predict()

# This is posterior_epred()
epred_manual |>
summarize(epred = mean(y_new))


Calling mean averages over the draws, so there is a only single estimate/draw rather than draws of ndraws() per row.

What does brms do to get multiple epreds per observation?

This post:

Suggests posterior_epred() would be mu` for a Student model.

posterior_epred is just the (conditional) expected value of y, so just mu in this case. sigma doesn’t come into play until you want posterior_predict.

1 Like

posterior_epred doesn’t average over draws. It’s computed and stored for each draw in brms. So in the student t case you mentioned it would contain all posterior draws of mu (ndraws x nobs if you extract it as a matrix).

1 Like