Difference of calibration and forecast with examples

Hello, may I ask some questions on calibration?

  1. What would the quote from the attached paper “calibration is not a useful measure of the goodness of forecasts”’ might mean? Would the misspecified model that is well-calibrated be an example?

  2. Do you know the opposite example where the predictive check is good but SBC result is bad? Would this necessarily flag a problem in the computation algorithm?

Thank you!

Dawid85_SelfCalibPriorNo_comment.pdf (611.5 KB)


Maybe the complete quote from Schervish will shed some light on what he’s trying to get at:

The purpose of probabilistic induction is not to be able to eventually determine what nature will do next, but rather to learn what we can from what nature has done in the past and to alter accordingly our uncertainty about what nature is going to do next (compare Jeffreys 1961, chap. 1). As such, it is not at all clear what role, if any, calibration has to play in probabilistic induction.



2 is an interesting question, but it might be important to acknowledge a possible equivocation on the word “calibration.” Your attached paper is using calibration to evaluate forecasts, whereas the paper that proposes SBC (https://arxiv.org/pdf/1804.06788.pdf) is talking about a way to diagnose “some error in the Bayesian analysis.” It’s my first time reading this, but I found the following: "[c]onsequently, any discrepancy between the data averaged posterior (1) and the prior distribution indicates some error in the Bayesian analysis. This error can come either from inaccurate computation of the posterior or a mis-implementation of the model itself. "

Regarding 1, the first theorem in “The Well-Calibrated Bayesian” shows that for any coherent “prior” (not really a prior, more like a joint distribution on a countably-infinite time series), that it will always think it’s calibrated. In other words, loosely speaking, all (coherent) models are well-calibrated, so why bother checking? Good, bad, everything.

That same paper gives a specific example of a well-calibrated model that forecasts poorly: “Murphy and Winkler (1977) show that experienced weather forecasters are, on the whole, well calibrated. Although this is not by itself a sufficient condition for their forecasts to be “good” (it would hold, for example, for a forecaster who invariably gave the long-term relative frequency of rain as his precipitation probability), it has often been taken to be a minimal desirable property.”