Numerical values for priors

I’m confused about this turn in the discussion. If the prior predictive distribution does not cover the data, this can indeed be a problem. Not always (I think in this thread I’ve already linked to our paper on the prior and the likelihood: http://www.stat.columbia.edu/~gelman/research/published/entropy-19-00555-v2.pdf), but mismatch between data and prior predictive distribution can be a concern. At the very least, it’s something the user should be aware of. As Meng, Stern, and I discussed in our 1996 paper, prior predictive checks and posterior predictive checks are both part of a more general family of checks, depending on which variables in the joint distribution are conditioned on and which are averaged over.

My comment was mostly about the opposite concern, which is if the prior is tailored so that the prior predictive distribution looks too much like the data that happened to be observed and is inconsistent with other plausible data that could have been observed. I agree that not covering the data is also problem. I think we discuss that a bit in the Visualization in Bayesian Workflow paper.

2 Likes

Isn’t Jonah’s comment,

more to your point? Maybe some of this is semantics, but I still find general proscriptions to using data for priors vague: all information is some form of data, and our knowledge informing priors do the same. I like Andrew’s characterization: “external information”. To Jonah’s point, we shouldn’t limit our prior to one batch of observed data (external to the data for inference), but instead should reason about potential outcomes that are not observed or reflected in that batch of data too, and code what’s plausible as a prior. But that external batch of data can still help us reason about plausible potential outcomes.

And in the other direction, considering whether the prior covers the modeled data is another way of checking whether we’ve considered plausible outcomes with our priors.

Hi Tiny, what is the level of difficulty for conducting the sort of experiments you do, and the time frame and financial commitment, as practical considerations? Could you try splitting your experiments slightly e.g., for each experiment collecting a small amount of data (or large amount, depending on ease). Run a basic analysis either simply looking at descriptive statistics or using a very weak set of priors, though as much as possible using your domain expertise. Then, use the posterior and what you have learned from these initial set of analyses to function as information for setting more informed priors for the complete analysis of a larger amount of data that you then collect - e.g., once you know the order of magnitude you are working with. These priors could conceivably be more informative that the posterior from your preliminary experiment, because they might not have updated as strongly as you could using your knowledge and seeing what is going on in the preliminary experiment. The key thing is that this initial ‘peek’ is not a peek at the final data you will use, and so is less like double-dipping the analysis.

Is this a feasible approach (practically and philosophically), anyone?

Jimbob: I think you’d be better off just fitting a hierarchical model.