That old case study is long out of date! More importantly it considers only *computational problems* and not *modeling* problems. My up-to-date writing is at Writing - betanalpha.github.io and for this topic I’d recommend taking a look at

GitHub - betanalpha/mcmc_diagnostics: Markov chain Monte Carlo general, and Hamiltonian Monte Carlo specific, diagnostics for Stan

Identity Crisis

Towards A Principled Bayesian Workflow

with emphasis on the latter.

Recall that in Bayesian inference we use our domain expertise to motivate a full Bayesian model,

\pi(y, \theta) = \pi(y \mid \theta) \, \pi(\theta),

plug in observed data \tilde{y} to obtain a posterior distribution,

\pi(\theta \mid \tilde{y}) \propto \pi(\tilde{y}, \theta),

and then extract approximate insights from posterior distribution through expectation value estimates,

\hat{f} \approx \int \mathrm{d} \theta \, \pi(\theta \mid \tilde{y}) \, f(\theta).

The immediate challenge in implementing Bayesian inference is *computational* – how well does the estimate \hat{f} approximate the true expectation value? If our estimates are too inaccurate then we will be effectively working with a skewed posterior distribution, and any problems with those inferences could be due to the skew rather than any inherent issues in our modeling assumptions.

Consequently the first step is to quantify the error in our posterior expectation value estimates. Exactly how we do this depends on the estimation method we employ – for example \hat{R} is one diagnostics that can identify pathological behavior in Markov chain Monte Carlo estimators. In general diagnosing problems in Markov chain Monte Carlo is much more subtle than just checking a few diagnostics – see for example the above link as well as Markov Chain Monte Carlo Basics.

Once we trust our posterior computation then we can tackle the adequacy of our modeling assumptions inherent to the choice of Bayesian model \pi(y, \theta). In particular we can compare how well the posterior distribution recovers features of the observed data through *posterior retrodictive checks*… Because we’re comparing to the data that we’ve already used we’re retrodicting here, not predicting . Posterior predictive checks describe comparisons to held-out data not used to inform the posterior distribution.

The difficulty here is coming to terms with the fact that our model will never be perfect, but at the same time that our observations will only ever offer limited resolution of the system being observed. In other words we have to determine which features of the system are relevant and then design summaries that can focus posterior retrodictive comparisons on those behaviors. In my experience this means that the automated, and hence unable to be tuned to the specifics of any particular analysis, checks that are commonly recommended have limited utility in practice.

Anyways this is all discussed in much more depth in Towards A Principled Bayesian Workflow so I’d recommend starting there. The workflow I suggest is also applied over and over again in Part III and the Case Studies on Writing - betanalpha.github.io so you can also see its benefit in action.