Use simulated or observed covariates in prior predictive checks?

In a typical Bayesian workflow, is it recommended to simulate our covariates before generating the prior predictive distribution? Are there any potential problems with using our observed covariates to check our priors?

1 Like

Good point. I currently believe that using the observed covariate values can in fact be beneficial - you don’t have to guess what is the plausible range of values etc. The biggest advantage of simulating covariate values are IMHO:

  • can be done before you collect data
  • can be done once and reused for a class of similar model-dataset combinations

The biggest potential problem I see with using observed values is that you can be tempted to sneak some properties of the observed outcomes into what you’ll consider a good prior. This needs to be resisted and only properties that are defensible without reference to observed outcomes should be used to guide your priors.

Best of luck with your checks!