Hi Daniel,
I’ve had this similar question myself: Is it fair to say that Stan consists of a ‘library of validated inference algorithms’?
Following the recommendation of @bgoodri you may want to start here - [Validation of Software for Bayesian Models Using Posterior Quantiles, Cook et al.]: https://amstat.tandfonline.com/doi/abs/10.1198/106186006X136976