Best practices for Simulation Based Calibration with hierarchical models

betanalpha · December 11, 2020, 8:57pm

SBC doesn’t require that that the estimators are exact, just that they’re unbiased. Consequently the only challenge in implementing SBC for Markov chain Monte Carlo is removing the autocorrelations as much as possible. Increasing the number of samples just makes the SBC assessment more sensitive to potential problems.

To be clear SBC assesses the accuracy of a sampling produce, and implicitly posterior expectation value estimation, within the context of a specified model, not the model itself. The method says nothing about whether the specified model is useful in any given application.

The problem is that there is no single deviation from uniformity. The rank histogram can deviation in many different ways, and each distinct deviation says something different about the nature of the problem. Even if a single summary/test is designed to capture interpretable deviations, such as those discussed in the paper, it will largely ignore other possible deviations.

Many uniformity tests, for example Kolmogorov-Smirnov, are based on statistics that don’t correspond to any particularly interpretable deviation in the SBC case, and hence aren’t all that useful for automated testing. We considered trying to come up with template-like tests to capture the smiles/frowns/tilts discussed in the paper but ultimately the rank histogram was the most information dense way of presenting the results.

At the same time recall that even in high-dimensional models there are often only a few parameters/summaries of interest to the final application, and SBC is much more productive when those parameters are prioritized instead of trying to test every parameter at once.

In my opinion Wasserstein is most usefully interpreted as an integral probability metric that bounds differences the expectation values of certain sets of functions instead of just differences of probabilities. See for example the discussion in https://betanalpha.github.io/assets/case_studies/markov_chain_monte_carlo.html#33_extra_credit:_theoretical_convergence.

Topic		Replies	Views
How do I evaluate my model, in terms of bias, coverage, etc.? General simulation-based-calibration	17	2985	January 4, 2023
Simulation-based calibration case study with RStan Algorithms simulation-based-calibration	31	3114	July 30, 2019
Using narrower priors for SBC General prior-choice , simulation-based-calibration	14	1508	August 6, 2021
Rewriting example models with SBC Developers simulation-based-calibration	8	1344	February 25, 2019
How to evaluate samplers for inclusion in Stan? Developers	15	454	March 8, 2025

Best practices for Simulation Based Calibration with hierarchical models

Related topics