I am hoping to check my understanding of how the generated quantities block actually samples from the posterior predictive distribution. In a simple hierarchical model where we estimate that effect d and sd sigma come from N(d,sigma) distribution, am I correct in understanding that I should arrive at the same conclusion for a posterior predictive distribution if at every iteration I were to draw a sample from a distribution with N(d[i],sigma[i])?

I am sure this is a hopelessly basic question but, as always, I would be grateful for any help you can provide.

The posterior predictive distribution is for observables. If in the generated quantities block, you do

normal_rng(d, sigma);

where d and sigma are declared in the parameters block, then you are just drawing from the posterior distribution of the unobservable effect in a new group (e.g. a school that you have not observed yet).

Thanks Ben, this helps clarify things for me. As a follow-up, if my model were a simple linear regression would the predictive distribution just be a prediction of a new data set based on parameters estimated at each iteration? How does this work in the case where we add an intercept for each group? Am I thinking correctly that it would be for e.g. the school intercept + non-varying parameters estimated at each iteration?

The concept of a posterior predictive distribution is the conditional distribution of a future outcome conditional on the current outcomes, whose PDF can be written as the the integral over the parameter space of: The PDF of the future outcome given the parameters multiplied by the PDF of the parameters given the current outcomes. Then you approximate that integral with draws from the posterior distribution composed with draws from the distribution that generated the data.

So, in the case of a flat linear regression model, you have draws of beta and sigma, where sigma is the standard deviation of the errors. So you draw a realization of an error from that normal distribution for each realization of sigma and then shift it over by the vector product of x and the corresponding realization of beta to get the a realization of the posterior predictive distribution.

In the case of a hierarchical model, you have to specify what you are conditioning on and what you are integrating over. For a new school, you first draw the error, then draw the intercept, then use a realization of the common coefficients and the school-specific intercept to shift the error over into a realization of the posterior predictive distribution.