Sampling from the posterior predictive distribution

If the pp distribution is p(yrep | y) = \int p(yrep | theta) p(theta | y) dtheta,
and we know just some (S) samples from the posterior then the pp distribution should be

p(yrep | y) = mean( p(yrep | theta_s) ).

If, for instance, the likelihood is gaussian, then the pp distribution is an average of S gaussians of parameters theta_s. I would have made a random sample of, say, M numbers of every those gaussians to get a matrix of S rows and M columns, and then the pp distribution would be (with R):

colMeans(yrep) # A vector of M elements.

But in the many examples and documentation I read the sampling procedure is to sample M = 1 from the s-th likelihood and have the distribution of yrep as the vector of S samples.

What is wrong in my first assumptions for the distribution?

p(yrep | y) is conceptually the integral over theta of the function p(y_rep | theta) * p(theta | y). We can draw from the distribution whose PDF is p(yrep | y) by repeating the following two steps many times:

  1. Draw theta_tilde from the distribution whose PDF is p(theta | y)
  2. Draw y_rep from the distribution whose PDF is p(y_rep | theta = theta_tilde)

In practice, we already have a bunch of draws from the posterior distribution, so we take each of them to be theta_tilde in turn and draw from the distribution whose PDF is p(y_rep | theta_tilde). You can then evaluate any function of the draws of y_rep.

What you are proposing is to estimate the expectation of theta — which is defined as the integral over theta of the function theta * p(theta | y) — and calling that theta_bar. If you then draw from the distribution whose PDF is p(y_alt | theta = theta_bar), that is only the right thing to do if you were certain that the unknown theta really is theta_bar, which you can never be absolutely certain of. Otherwise, the resulting draws of y_rep do not approximate draws of y_rep; they will have too little variance and tail heaviness because you have discarded you uncertainty over what theta is.

Thanks for your answer.

I had misunderstood (almost) everything. I have to read more carefully next time :-)

I have found another explanation to my doubts here, maybe it can help others.

The integration is done in this case just looking to one side of the table, no summations at all.

Your thread is quite old by now, but perhaps this new thread could also help to clarify any open questions.