Calibration plot for predictive distribution, PPC

daniel_h · May 29, 2019, 11:31pm

I’ve fitted a regression model and now have observations y and samples from the posterior predictive distribution y_{rep}.
One idea to test the model was to see if the predictive intervals are correctly calibrated, e.g. a posterior predictive interval that covers 10% density should contain the observation y in 10% of the cases (is this true?).

For the plot below I went through all pairs of (y_i, y_{rep_{i}}), calculated an x% interval (based on quantiles) for the draws of y_{rep_{i}} and counted how often y_{i} is in that interval.
The error bars are 95% intervals of a Beta distribution with \alpha = #observations inside the interval + 1 and \beta = #observation outside the interval + 1.

Interestingly, my posterior predictive distributions seem to be a bit too wide, e.g. ~55% of my data points are within the 50% predictive interval.

Is my approach even valid? I’ve only seen this in binned form for logistic regression models before (e.g. in @avehtari’s tutorial here: https://avehtari.github.io/modelselection/diabetes.html)
Should I be concerned for this specific case? All my other PPCs look okay.

betanalpha · May 30, 2019, 6:43pm

Not 10% probability density but rather 10% probability itself, or sometimes denoted probability mass to emphasize the difference from density.

It is not – there is no guarantee that intervals of the posterior predictive distribution will have well-defined coverage with respect to the true data generating process unless your inferences are perfect.

Ideally the posterior predictive distribution will converge to the true data generating process in the limit of infinite data. For any finite data set, however, inferences about the model configurations will be uncertain and the posterior predictive distribution will end up averaging over many possible data generating processes. If the model includes the true data generating process then this will result in wider intervals and higher coverage than expected, which is consistent with what you see. If the model doesn’t include the true data generating process, which is the more realistic option, then you might also see biases in the coverage.

That said, there is an additional complication here because the data are being used to both compute the posterior, hence the posterior predictive and the posterior predictive intervals and evaluate the coverage of those intervals. That correlation can have complex effects on the coverage behavior.

avehtari · May 30, 2019, 8:41pm

What @betanalpha said. The tutorial has also example how to use loo predictive distributions so that the same data is not used to compute the interval and evaluate the coverage of those intervals, which helps for that specific issue, but still the other problems @betanalpha said remain.

daniel_h · May 31, 2019, 11:12am

Thank you, that was really insightful!

Topic		Replies	Views
Prediction Intervals General	8	2996	October 14, 2019
Calibration plot for Bayesian binomial models General posterior-predictive	2	658	June 7, 2022
Posterior Predictive Check Modeling fitting-issues	2	537	November 24, 2022
Posterior predictive tail thicker than observed Modeling	3	612	May 20, 2019
Post. Predictive Interval for a single observation with only group membership as a predictor brms bayesplot	8	851	August 26, 2018

Calibration plot for predictive distribution, PPC

Related topics