I’ve fitted a regression model and now have observations y and samples from the posterior predictive distribution y_{rep}.

One idea to test the model was to see if the predictive intervals are correctly calibrated, e.g. a posterior predictive interval that covers 10% density should contain the observation y in 10% of the cases (is this true?).

For the plot below I went through all pairs of (y_i, y_{rep_{i}}), calculated an x% interval (based on quantiles) for the draws of y_{rep_{i}} and counted how often y_{i} is in that interval.

The error bars are 95% intervals of a Beta distribution with \alpha = #observations inside the interval + 1 and \beta = #observation outside the interval + 1.

Interestingly, my posterior predictive distributions seem to be a bit too wide, e.g. ~55% of my data points are within the 50% predictive interval.

1 Like

Not 10% probability *density* but rather 10% *probability* itself, or sometimes denoted *probability mass* to emphasize the difference from density.

It is not – there is no guarantee that intervals of the posterior predictive distribution will have well-defined coverage with respect to the true data generating process unless your inferences are perfect.

Ideally the posterior predictive distribution will *converge* to the true data generating process in the limit of infinite data. For any finite data set, however, inferences about the model configurations will be uncertain and the posterior predictive distribution will end up averaging over many possible data generating processes. If the model includes the true data generating process then this will result in wider intervals and higher coverage than expected, which is consistent with what you see. If the model doesn’t include the true data generating process, which is the more realistic option, then you might also see biases in the coverage.

That said, there is an additional complication here because the data are being used to both compute the posterior, hence the posterior predictive and the posterior predictive intervals *and* evaluate the coverage of those intervals. That correlation can have complex effects on the coverage behavior.

1 Like

What @betanalpha said. The tutorial has also example how to use loo predictive distributions so that the same data is not used to compute the interval and evaluate the coverage of those intervals, which helps for that specific issue, but still the other problems @betanalpha said remain.

1 Like

Thank you, that was really insightful!