Prior Choice for Beta Binomial Dispersion

I’m fitting a Beta-Binomial model in brms as described in Paul Burkner’s vignette on Custom Response Distributions. In particular that means I’m parameterising the beta binomial as

S \sim \text{BetaBinom}(N, \mu, \phi),

Where \mu describes the mean, and \phi controls over-dispersion (in the classic parameterisation \alpha = \phi\mu and \beta = \phi(1-\mu)).

I’m interested in views on how best to go about setting a prior on \phi. I’ve detailed my thinking below, and would welcome any thoughts on approaches.

One immediate thought that comes to my mind is whether I would be better off parameterising in \phi^{-1}.


My approach is to set a prior in terms of the residual deviation that the binomial proportion S/N would have regardless of sample size, eg. I feel comfortable making the statement:

For arbitrarily large samples, I still anticipate rates to deviate by +/-p.

For my situation, typically I’d be setting my expectations of p to be in the range of 1 - 5%.

Asymptotically, the variance of the beta binomial proportion is that of the Beta distribution with the same parameters

\begin{align*} \lim_{N \rightarrow \infty} \text{Var}(S/N) & = \frac{\mu(1-\mu)}{1 + \phi} \\ & <\frac14 \frac{1}{1+\phi} \\ & \sim \frac{1}{4\phi} \end{align*}

For the purpose of setting a prior, the approximation feels reasonable so long as \mu is known not to be too close to 0 or 1.

Very roughly this means that asymptotically, 95% of the density of S/N will be in the range

\mu \pm \frac1{\sqrt{\phi}}.

So from my original framing in terms of my anticipated deviation p I now know I can approximate p \sim \phi^{-\frac12}. At this point I’m choosing a prior \phi \sim \text{Exponential}(p^2).

Example. If my prior expectation is for deviations to be around 5%, then I’d choose a prior \phi \sim \text{Exponential}(0.05^2), which following the approximation above implies the distribution below on p.
image

2 Likes

I’d use either an exponential prior (as you do) or a Pareto prior. I’d also try to ask whether 0 makes sense (uniform distribution) or infinity makes sense (no over dispersion relative to binomial). The family shouldn’t matter much unless you care about tail behavior.

The other thing to consider is flipping the parameter and putting a prior on 1/phi. That way, bigger values are more over dispersed.

Another approach would be to define

\phi = \frac{1-\theta}{\theta} \quad ,

where \theta \in [0,1]. \theta = 0 corresponds to no overdispersion, and \theta = 1 corresponds to maximal overdispersion. You can then place an appropriate Beta distribution on \theta.

1 Like

Are you sure the endpoints should be included?

Neat idea, @kholsinger.

@maxbiostat You’re right. The end points would correspond to a beta with zero or infinite parameters, so they shouldn’t be included. Stan suffers from floating point issues, though, so you can get rounding or overflow to those values inadvertently.

Thanks all for your input - and apologies for the slow response. Good to see that there’s no objections to my proposed approach, so for now I’ll stick with this as I like the heuristic for setting soft ranges on the overdispersion. @Bob_Carpenter - I did previously try working with \phi^{-1} but found Stan struggled to converge.

Overall, I guess what surprises me is that the parameterization I started with seems to come up often in examples (for instance Richard McElreath’s Statistical Rethinking, and the example Paul Bürkner’s vignette on custom families in brms), and yet I haven’t been able to track down much on the practicalities of prior choice.

In McElreath’s case he takes an Exponential(1) prior, but for the reasons set out in my original post I find this leads to wide over dispersion even in the presence of large volumes of data.