How to get samples from the posterior predictive distribtion using stan

My questions are

  1. How to get a sample from the posterior predictive destribution using the rstan.

  2. The difference of two distributions, one is a model distributions whose parameter is taken at the EAP (expected a posterior) estimate and one is a the posterior predictive destribution (PPD).


In Andrew Gelman book " Bayesian Data Analysis", he evaluates a statistical model, using the data from the posterior predicitive distribution p( \dot {} | D) for given data D. Recall that it is define by

p(y | D) = \int f(y|\theta)\pi( \theta|D )d\theta

where y is a future data, f(y|\theta) is a likelihood (model) with pamater \theta and \pi( \theta|D ) is a posterior distribution.

  1. Using rstan, is it possible to get the data described in data block of stan file from the a posterior predictive distribution? To tell the truth my model is very complex and I am not sure the concrete form of f(y | \theta). Using lp__ samples, in a stanfit object and the Monte Carlo integral gives me one methods. But I also have to get f(y | \theta) and this is very hard for me.

  2. I wonder there is a significant difference between the samples, one is obtained from a model distribution f( \dot {} | \theta _{\text{EAP}}) and one is from the a posterior predictive distribution?

That is, the difference of two random variables X,Y

X \sim p(y|D)

Y \sim f(y| \theta_{EAP})

where p(y|D) is a posterior predictive density and f(y| \theta_{EAP}) is a model at EAP.

1 Like

Since you already have the book Bayesian Data Analysis, I can point you to a couple relevant parts of that book. I’m using the 3rd edition.

From p. 146:

In practice, we usually compute the posterior predictive distribution using simulation. If we already have S simulations from the posterior density of \theta, we just draw one y^{\rm rep} from the predictive distribution for each simulated \theta; we now have S draws from the joint posterior distribution p(y^{\rm rep}|y).

There’s also R code on page 596 of the book that illustrates how to do what was just quoted above. For any onlookers who don’t have the book, said code can be found here: http://www.stat.columbia.edu/~gelman/book/software.pdf

6 Likes

Thank you !!
Your link helps me, since my book is the second edition which does not describe the Stan code.
Now I cannot understand the p146 sentence.

I will read the web page and try to understand.

Thank you !! I can understand !!

Let f( y | \theta) be a model. Then given data y_0, we can get MCMC samples \theta_1,\theta_2,...\theta_N by the Stan. From this samples we can get the sequence of models f( y | \theta_1), f(y|\theta_2), ...,f(y|\theta_N)… Drawing the data y_1,y_2,...y_N so that

y_1 \sim f( y | \theta_1),

y_2 \sim f( y | \theta_2),

\cdots

y_N \sim f( y | \theta_N),

we can also interpret that samples y_1,...,y_N are from the posterior predictive distributions. And to implement this procedure, we no longer need the Stan if such sampling can do with a package stats.

1 Like