Prior predictive check too stringent for actual priors?


Still quite new to Stan here. I have been practicing doing simulations of the data generating process to see what priors are sensible, i.e. when parameters are drawn from the priors I get qualitatively similar data to my dataset.

I try to give wiggle that’s a bit beyond realistic appeareance to make sure the priors are not too stringent. But it seems that the priors that make my simulations look good are often too stringent to include as actual priors in my model.

For example, I found \sigma\sim\textrm{Exp}(1) is a good prior for the scale parameter in many situations, hierarchical, using ODEs, etc, but it is often very inappropriate in prior predictive checks. I get good looking simulations with \textrm{Exp}(5) or \textrm{Exp}(10) , but those always yield a poorer posterior sample.

I could try to come up with a small reprex so that people may have a chance to give a better answer, I will if required, but I perused the forums and I feel this question must have been addressed or someone has already encountered this and thought of an answer.

I have heard about Stan from McElreath’s awesome Rethinking course, where he advocates for and goes through many prior predictive checks in his lectures. He always uses \sigma\sim\textrm{Exp}(1) but doesn’t actually justify that one from a prior predictive check perspective I believe. I am working with data where the scale of the numbers are very small indeed, for example, bacterial growth curves measured by a spectrophotometer yield values between 0-1 with resolution of 0.01 and sigma values for various models around these growth curves are typically 0.05-0.1. If I sample from the aforementioned prior I get a complete mess in the vast majority of simulations.

Thank you for any precious insight


Michael Betancourt has a lot of incredibly useful writing on topics that seem to be of relevance to your questions:

  1. Prior Modeling.
  2. Identifiability issues and their impact on model convergence
  3. Hierarchical modeling, which goes in to depth about how prior and likelihood informativeness influences the geometry of hierarchical model posteriors and thus model convergence.
  4. A general discussion of principled model building.

Specific answers to your questions regarding troubles that you have had when fitting models using data-informed priors are only answerable with additional details about your data and the models you are trying to fit. The extent to which priors influence the efficiency and effectiveness of posterior sampling is a function of the stringency of your priors, how informative your data is, and the type of models you are trying to fit. As is discussed in some of the writings I linked, convergence issues when using certain priors may be due to the data not being informative enough to overcome degeneracies in the geometry of your posterior distribution. While more informative priors can overcome these issues, you should only use priors as informative as your knowledge of the system permits. Thus, sometimes more data or a different kind of data is the key to model convergence.