When could SBC complement predictive checks?

I am trying to specify the situation where SBC is needed to complement predictive checks; can anyone share their thoughts or recommend literature on the following, please?

  1. Example situations where self-inconsistent model that false positively give decent predictive results can be a problem?

  2. For cases where only good predictive inference is needed, what could go wrong if the model is self-inconsistent?

  3. SBC’s role in model validation? Can it detect model misspecification?
    As the example in Talts, 2018 is titled ‘6.1 misspecified prior’, I thought SBC could detect model misspecification. However, model misspecification refers to the situation when the model can’t match the data generating process. As data is generated from the model in SBC, the model is always well specified.
    \pi(\theta) \simeq \int \mathrm{d} \tilde{y} \mathrm{~d} \tilde{\theta} \pi_{\text {approx }}(\theta \mid \tilde{y}) \pi(\tilde{y} \mid \tilde{\theta}) \pi(\tilde{\theta})
    To be precise, inconsistency is observed when approximate joint or conditional distribution of computation algorithm is not close enough to that of original model ie. \pi(\theta |y) and \pi_{approx}(\theta ,y) are largely different. However, the identity holds true for any \pi(\tilde{y} \mid \tilde{\theta}) \pi(\tilde{\theta}). Then what word can we use to describe the work SBC is believed to do in terms of model validation? Could the difference of true distribution and finite sample distribution be one candidate?

For 1 or 2, comments from authors of the following two papers which has high focus on predictive beharviors of the model (if I understood correctly) would be very helpful!
Projection Predictive Inference for Generalized Linear and Additive Multilevel Models @AlejandroCatalina @paul.buerkner @avehtari
A Decision-Theoretic Approach for Model Interpretability in Bayesian Framework @to-mi @jpiironen @avehtari

Note) a validation methodology comparison concentrates on parameter distribution in Yao, 2018. It explains VSBC diagnostics could complement PSIS diagnostic when VI posterior produce good point estimates even though the underlying distribution has stark difference with the true posterior.

Thank you.


Finally had time to take a look at this! Sorry for the delay.

I can share my thoughts regarding my experience and own research on projection predictive inference and variable selection in general.

In projpred we do variable selection by projecting predictive draws, which, for some models, may result in good predictive performance but the projected parameters may not coincide with the full reference model. This is more common in models where the random intercepts can capture most of the variance in the outcome, and the variable selection may tend to think that beyond random interacts nothing is truly relevant (if no restriction is placed on the model search). I figure SBC calibration can be important in this scenario to tell the user that while the model performs okay, the posterior calibration is not good.

On the other hand, calibrating the predictive distribution of the selected projection is also interesting, and we are currently doing some experiments ourselves in this regard. I assume if the reference model is self-inconsistent there may be issues while computing the projections, or simply bad projections that don’t perform well? I haven’t really checked this, just a thought. So in this scenario checking SBC of the reference model before running variable selection is a good diagnostic.

Regarding your final point, I believe SBC is able to detect misspecified models, according to https://arxiv.org/pdf/1804.06788.pdf if I didn’t misunderstand it.

I hope this helps!


SBC and predictive checks are almost completely independent.

SBC checks how robust an algorithm is over an ensemble of posterior fits within the context of a given model. It has no sensitivity to how useful that assumed model might be.

Predictive, typically retrodictive, checks investigate any lacking behavior in the assumed model relative to the observed data.

The only interaction between the two is that principled (pre/retro)dictive checks require that the posterior is being faithfully qualified by whatever algorithm is being used, otherwise one can’t tell if any disagreement is due to the model or the algorithmic fit of the posterior. SBC provides one limited diagnostic of that algorithmic faithfulness. That’s why algorithmic calibrations like SBC should come before any predictive checks (Towards A Principled Bayesian Workflow).

It depends on how “predictive performance” is defined. Retrodicive performance metrics (how well the posterior predicts data used to fit) and pointwise predictive performance metrics (a single-held out data set) capture only a limited view of how close the posterior is to the true data generating process, and how well it will generalize to exact replications of a given measurement. See Towards A Principled Bayesian Workflow for much more.

When focusing on these metrics alone it’s easy to overfit with models that have nothing to do with what’s known about the data generating process. To be a bit cheeky that’s it’s kind of all of machine learning.

A generative model exploits one’s domain expertise to not only fit the observed data but hopefully generalize to other, similar measurements. Indeed it’s only with a generative model that one can use parts of the model to make inferences and predictions about other, counterfactual circumstances that one might see in the future. That’s why prior checks of the generative structure of a model are so important.

But SBC isn’t sensitive to any of these issues.

Again, it depends on what you mean by predictive inference. If all you have to do is make predictions for the same, static system then you just have to model the corresponding data generating process well enough, but even in that limited case most predictive metrics don’t fully quantify that predictive performance. At best they approximate it.

“Inconsistent” models might accidentally capture one observation well (for example a delta function around any one observation) but then poorly capture any other observation.


All SBC does is check the consistency of an ensemble of posterior fits with the assumed prior model. This requires that the posterior fits are accurate and that the posteriors are implemented consistently with the prior predictive distribution used to simulate the data.

1 Like