I probably focused too narrowly on the technical part - since this is tangential, I made an outline at Idea: A simple plugin system for CmdStan . But whatever the tech, I think a goal should be that a sampler that is in beta is implemented and documented in such a way that a substantial fraction of Stan users could (if they wanted) try the sampler for their problem with a reasonably short setup (say <30 minutes). This could totally be achievable with a BridgeStan implementation.
That’s not true, therre are many known tests for discrete uniformity. The gamma statistic from [2103.10522] Graphical Test for Discrete Uniformity and its Applications in Goodness of Fit Evaluation and Multiple Sample Comparison is particularl useful IME (and the tail quantiles of the null distribution can be evaluated to use it in a test)
True, but even for the approximate ones, it would IMHO be good to know the extent of the miscalibration. I don’t want to push SBC too strongly. It is definitely not a hard requirement IMHO.