Underdispersed binomial glm




I am fitting a generalised linear model on binomial data. Although I know overdispersion is much more common, I am pretty sure that my data are actually underdispersed. My two clues for this are that, in 99.8% of replications, the generated data had greater variance than the dataset; and also, plots of the residuals by the number of successes show a decreasing linear trend, indicating that for a low number of predicted successes, the number of successes is actually higher, and vice versa.

For overdispersed data, I have previously used the beta binomial, but I am unsure what to do with this underdispersed data. Does anyone have any advice for how to do this in Stan?

Please let me know if you need any additional information!



The binomial doesn’t give you independent control over variance—it’s always N * theta * (1 - theta). And that’s a lower-bound on the beta-binomial. That’s because this amount of variance is inherent in independently choosing each outcome in the Bernoullis making up the binomial. If your data are underdispersed, then they’re not repeated independent Bernoulli trials. Which brings up the obvious question of what they are, as that’s where you want to look for clues about how to model them.

There’s a generalized Poisson that can have lower dispersion that the Poisson. I have no idea how hard it is to code its pmf.

Or if the data size is large enough you can use a continuous approximation.