Sampling from the posterior predictive distribution

UNAI_SAN_MIGUEL · August 25, 2017, 8:47am

If the pp distribution is p(yrep | y) = \int p(yrep | theta) p(theta | y) dtheta,
and we know just some (S) samples from the posterior then the pp distribution should be

p(yrep | y) = mean( p(yrep | theta_s) ).

If, for instance, the likelihood is gaussian, then the pp distribution is an average of S gaussians of parameters theta_s. I would have made a random sample of, say, M numbers of every those gaussians to get a matrix of S rows and M columns, and then the pp distribution would be (with R):

colMeans(yrep) # A vector of M elements.

But in the many examples and documentation I read the sampling procedure is to sample M = 1 from the s-th likelihood and have the distribution of yrep as the vector of S samples.

What is wrong in my first assumptions for the distribution?

bgoodri · August 25, 2017, 12:43pm

p(yrep | y) is conceptually the integral over theta of the function p(y_rep | theta) * p(theta | y). We can draw from the distribution whose PDF is p(yrep | y) by repeating the following two steps many times:

Draw theta_tilde from the distribution whose PDF is p(theta | y)
Draw y_rep from the distribution whose PDF is p(y_rep | theta = theta_tilde)

In practice, we already have a bunch of draws from the posterior distribution, so we take each of them to be theta_tilde in turn and draw from the distribution whose PDF is p(y_rep | theta_tilde). You can then evaluate any function of the draws of y_rep.

What you are proposing is to estimate the expectation of theta — which is defined as the integral over theta of the function theta * p(theta | y) — and calling that theta_bar. If you then draw from the distribution whose PDF is p(y_alt | theta = theta_bar), that is only the right thing to do if you were certain that the unknown theta really is theta_bar, which you can never be absolutely certain of. Otherwise, the resulting draws of y_rep do not approximate draws of y_rep; they will have too little variance and tail heaviness because you have discarded you uncertainty over what theta is.

UNAI_SAN_MIGUEL · August 28, 2017, 9:43am

Thanks for your answer.

I had misunderstood (almost) everything. I have to read more carefully next time :-)

I have found another explanation to my doubts here, maybe it can help others.

The integration is done in this case just looking to one side of the table, no summations at all.

Homer · March 3, 2019, 1:53pm

Your thread is quite old by now, but perhaps this new thread could also help to clarify any open questions.

Topic		Replies	Views
Sampling from the posterior predictive or computing it for making predictions Modeling posterior-predictive	5	1426	June 28, 2022
Checking understanding of posterior predictive distribution General	3	1207	December 15, 2017
Out of sample predictive inference with a Gaussian Process General	1	355	March 16, 2021
How to get samples from the posterior predictive distribtion using stan Modeling	3	5321	February 1, 2019
Creating yrep for the ppc_stat function from bayesplot brms fitting-issues , bayesplot	3	1426	May 10, 2021

Sampling from the posterior predictive distribution

Related topics