Fake data for prior predictive checks

Hi all,

I am fairly new for Bayesian statistics and Stan but I have been learned so much and it’s really exciting for me!

I started to generate fake data to check the priors. However, one question that I am encountering is that how “fake” should my fake data be? I am currently building a Poisson model with the offset and two numeric predictors in the model. It seems to me that it’s easier to predict the priors when I use the offset from the data with fake values of the predictors than having all of them faked. However, I guess it can be bad practice if not using fake values for all?

Can anyone help me with this? Thanks in advance!

1 Like


I just recently created a very delimited and small example for my students, but the approach is generalizable to your case I guess, which you can find here. (Heavily inspired by the book written by @richard_mcelreath, but any faults in the example are made by me of course…)

The source code for the example can be found here:

I usually think of prior predictive analysis as simply checking that we do not allow too many absurd values (extreme is still ok, absurd is not).