I am intentionally fitting a beta-binomial model with very small dispersion values and limited n, generated from the same distribution. Compared to a binomial model, the beta-binomial takes much longer to fit and results in several divergent transition warnings as well as unreliable estimates.

Because of the computational cost that can be incurred when directly estimating the full posterior on a minimally overdispersed binomial, I felt that it was a good idea to recommend first getting a point estimate with Stan’s optimizer to get a ballpark for the dispersion parameter and if it’s too small, just fit a binomial model.

I feel like this is a particular case of Gelman’s folk theorem, though it’s slightly counterintuitive since we end up selecting the “wrong” model over the actual one due to computational limitations. Does my suggested approach make sense though?