Interpreting SBC in a Bayesian Workflow

Hello, Stan community!

First, thank you all for this fantastic forum; I learn so much here!

I ran a Simulation-Based calibration (SBC) in the context of a Hierarchical Generalized Linear Model, and I realized that each run of the SBC is much faster than the actual model fitting.
I was wondering if I could interpret this finding as an “incorrect” model on my variate y since, in the end, it is the only difference between SBC, which fits y_generated, and the actual fitting process, which uses y observed.

Thank you all!
Mattia

1 Like

Assuming that the distribution and sample sizes of the data across covariates are identical in the SBC and in your real dataset, and assuming this difference in speed is consistent both across repeated runs of the full model and across all of a reasonably large number of SBC runs, this seems likely. The intuition here is that if the wall time is a reliable-ish function of the simulated data, then this is essentially a posterior predictive check. To make it even more like a posterior predictive check, you could draw the parameter values in SBC not from the prior, but rather from the posterior. As with posterior predictive checks, to feel confident of misspecification you would want to see that your wall time is atypical compared to a large majority of the wall-times on the simulated datasets. If it’s slower than 90% but similar to 10%, that wouldn’t be particularly strong evidence for misspecification.

With all of that said, formal posterior predictive checking with judiciously chosen discrepancy functions will almost surely give you better, deeper, and more precise insight into whether and how your model is misspecified. I wouldn’t rely on wall time from fitting as a particularly reliable or useful discrepancy function, since it’s noisy (it’s a nondeterministic function of yrep) and tells you almost nothing about the nature or manifestation of a problem even if it identifies that a problem exists, not to mention that it’s extremely expensive to compute! A good discrepancy function should suggest a modification to the model to reduce the identified misspecification.

5 Likes

Thank you Jacob!