The concept of a posterior predictive distribution is the conditional distribution of a future outcome conditional on the current outcomes, whose PDF can be written as the the integral over the parameter space of: The PDF of the future outcome given the parameters multiplied by the PDF of the parameters given the current outcomes. Then you approximate that integral with draws from the posterior distribution composed with draws from the distribution that generated the data.

So, in the case of a flat linear regression model, you have draws of `beta`

and `sigma`

, where `sigma`

is the standard deviation of the errors. So you draw a realization of an error from that normal distribution for each realization of `sigma`

and then shift it over by the vector product of `x`

and the corresponding realization of `beta`

to get the a realization of the posterior predictive distribution.

In the case of a hierarchical model, you have to specify what you are conditioning on and what you are integrating over. For a new school, you first draw the error, then draw the intercept, then use a realization of the common coefficients and the school-specific intercept to shift the error over into a realization of the posterior predictive distribution.