This is, I think, at the very edge of topics that are appropriate for this board, but I think it’s relevant enough. And I thought I’d ask, as people will probably have more well thought out views on it than me.
A group in our department recently had a discussion about a paper where they had implemented a bespoke Bayesian model (not in Stan in this case, but it would be just as relevant here). The paper provided no evidence in the methods that the model was sampling from the correct posterior (i.e. implemented correctly), and no statement anywhere indicates that this was even checked. If you were asked to review a paper like this, where the model was coded from scratch by the authors and not using a standard front end (which will have presumably been more thoroughly tested), do you think you would be justified in demanding that the authors do a calibration, using something like simulation-based calibration?
For the record, my instinct is yes. I think I would ask for it. I can’t see that the results can be trusted without it, given the risk of coding errors that are easy to miss.