Prior Predictive Checks

anon79882417 · October 10, 2018, 6:54am

Hey all -

So I’ve seen classic examples of say, if we have a binomoal likelihood, a coin flip, we want something like a N(.5,1) as a prior. (if we’re going for conjugacy, we want a beta binomal, but that’s not what we’re discussing).

So - I’m very fortunate to have come across Gabry et al’s “A visualization in …”.

I’m ok with posterior predictive checks. I’ve made blatantly obvious mistakes in the past, i.e. my “applied GPs in Stan” post. But I want to formalize the idea of a prior predictive check, especially when the likelihood is unknown. In this paper, I’m given visualizations with no code and it’s hard for me to formalize the idea.

For the coin flip example. I have a binomial RV. What’s P? I don’t know. Is my best guess still N(.5,1), or should I estimate from the data what my prior should be? What if the coin is bullshit, and it’s like a binomial with p=.0000000001, and I’ve guessed .5? I can do a simulation to show what my honest guesses do, but I want a more general answer with different likelihood functions. How much weight is the prior carrying?

Any papers/case studies/plain obvious examples I should look at?

FYI: posterior predictive checks - all about it! reject models regularly for unrealistic posterior predictive checks. The classical analogy is obvious extrapolation… same deal in machine learning and applied math…

Thanks all,

Andre

bgoodri · October 10, 2018, 7:29am

anon79882417 · October 17, 2018, 5:08am

Cool. I see this code, at it’s extremely simple.

We only specify groups and number of observations, the priors and the likelihood with no data.

With packages like rstanarm is there a way that I can easily simulate from the prior predictive, or need I dump the code out and recreate what’s done in Gabry’s Bayes-Vis paper?

I’m looking at posterior_vs_prior in rstanarm, but it’s not looking like I’m generating observations from the prior predictive.

Am I missing something? Did I not dig enough into the code?

yuling · October 17, 2018, 7:32pm

Prior reflects your knowledge of uncertainty before seeing the data. N(.5,1) seems hard to justify as theta should at least live within (0,1) interval. Beta(a, b) is more natural in terms of conjugality. After all, it is a trivial exponential family.

In decision theory you can ask for non-informative prior, as they are also connected to minimax. In this case the non-informative prior gives you 1/(p(1-p)), which can be approximated by beta(epsilon, epsilon). It is a flat prior on logit scale [log (p / 1−p)].

Again this reminds me of the merit of boundary-avoiding-prior-- an otherwise perfect beta(0, 0) just provides the opposite effect. A boundary-avoiding-prior in the logit space can be a boundary-embracing-prior in the p space.

On the other hand, if I only put a flat prior on p, then it is converted into a boundary-avoiding in the logit scale. This seems even more correct, as Stan only samples from the unconstrained space.

It is like optimization, I can add epsilon*Identity to remedy degeneration. Now a boundary-avoiding prior adds log-convexity.

Finally, beyond the benefit of smoother sampling, boundary-avoiding-prior is dangerous. If there is actual prior-data conflict, I will completely miss that. I believe this is why a prior-predictive checking will be emphasized.

Topic		Replies	Views
Pedictive posterior visualization for logistic(or binomial) regression Modeling	5	635	February 25, 2022
Setting up priors - the practical side - and some clarifications brms	4	899	April 19, 2020
Posterior predictive checking Modeling	10	2021	November 26, 2019
How to translate my prior belief into a prior Modeling specification	4	609	July 4, 2018
Prior predictive checks for rstanarm models? rstanarm rstanarm	2	471	April 6, 2024

Prior Predictive Checks

Related topics