Prior for kappa in beta-binomial

Hi all,

I’m trying to come up with a decent prior for \kappa in the beta_proportion distribution in Stan. As I understand is, \mu is the mean probability while \kappa determines how concentrated it is. Compared to the regular Beta distribution parameterised with shapes \alpha and \beta, the beta_proportion distribution differs as:

\begin{aligned} \alpha &= \mu \cdot \kappa \\ \beta &= (1 - \mu) \cdot \kappa \end{aligned}

While doing some prior predictive simulations sampling \mu \sim \operatorname{Beta}(1, 1) and \kappa \sim \operatorname{Exponential}(1), I noticed I was getting some unfortunate baththub-shaped distributions. I asked chatGPT which suggested I fix a lower bound of \kappa to be \operatorname{max}(\mu, 1 - \mu) to avoid the bathtubs. I think this solution probably works fine, but I was wondering if anyone had a (perhaps more principled) prior distribution for \kappa?

Thanks!

Matt

Hi Matthijs, I would suggest using a log-normal distribution for kappa. It’s not very principles, but I think there is one main reason for why it makes sense (to me):

  • the concentration of the beta distribution is much more sensitive to changes in the value of kappa when kappa is closer to zero

I would further recommend that you check the implied prior for the two shape parameters, given your choice or prior for my and kappa. Whenever one of the two shape parameters is <1, you’ll get a J-shaped distribution, and when both are <1 you get a U-shapes distribution (your bath tub). So maybe do a few checks to see what the prior probability is that either of the shape parameters is <1. This probability increases for values of mu closer to zero or one, and of course, lower values of kappa.

So I think I’ve made some progress, inspired by the PC prior for the dispersion term of the negative binomial distribution. In Stan’s negative_binomial_2 parameterisation, the negative binomial simplifies to the Poisson as \phi \to\infty, so the recommendation is to place a prior on \phi^\prime = \phi^{-\frac{1}{2}}.

I am interested in placing a prior on \kappa in a beta-binomial model but using something like a PC prior. The base model for the beta-binomial would be the binomial with fixed \mu. As \kappa \to \infty, the beta_proportion distribution collapses to a point mass centered on \mu. So I tried placing a prior on \kappa_\prime = \kappa^{-\frac{1}{2}}, analogous to negative binomial.

Also, to avoid bathtub shaped priors, I used \sqrt{\min(\mu, 1 - \mu)} as an upper bound for \kappa^\prime, so in Stan:

parameters {
  real<lower=0, upper=1> mu;
  real<lower=0, upper = 1 / sqrt(min([mu, 1 - mu]))> kappa_prime;  // impicit uniform prior
}
transformed parameters {
  real kappa = 1 / square(kappa_prime);
}

Would love to hear some feedback on this.

EDIT: Stan really struggles when kappa = 1 / square(kappa_prime), probably due to the geometry of very high values of kappa. However, kappa = 1 / kappa_prime also works. I think an exponential prior on the reciprocal of kappa is therefore a decent starting point that penalises complexity and basically collapses to a binomial distribution when \kappa = 0.

This is important:

In Gelman et al.'s Bayesian Data Analysis (available as a free pdf from the book’s home page), they go over this example in Chapter 5 and discuss exactly this property.

If you use a lognormal distribution, you will avoid estimates that get too close to zero because

\lim_{Y \rightarrow 0} \ \text{lognormal}(Y \mid \mu, \sigma) \rightarrow 0

(i.e., it’s a so-called “zero avoiding” prior). This can be helpful precisely because the beta distribution goes bonkers as \kappa \rightarrow 0.

Gelman et al. suggested the prior

p(\kappa) \propto \kappa^{-5/2}, \qquad \qquad (eq. 5.10)

which is a Pareto distribution. The Pareto needs a lower bound \epsilon > 0 in order to be normalizable (Gelman et al.'s prior over (0, \infty) is improper as presented).

This is always good advice. You see Gelman jointly scatter plotting prior draws of (\log \alpha / \beta, \log \alpha + \beta) in Figure 5.2. This is a joint distribution that depends on the distribution chosen for the mean \alpha / \beta. You can explore it using prior predictive checks.

2 Likes

I love PC priors, but find them hard to formulate. In my understanding, the penalized complexity approach fits both models (the simpler and richer model), then fits a mixture term that weights them according to how much of the total variance they explain. What you’re doing is what I would call “shrinkage” or “regularization.” You have just formulated a prior that concentrates mass around smaller values of \kappa.

To avoid poles in the prior, you need to satisfy

\mu \cdot \kappa > 1 and (1 - \mu) \cdot \kappa > 1,

which means

\kappa > 1 / \mu and \kappa > 1 / (1 - \mu).

You can code this directly as follows.

real<lower=0, upper=1> mu;
real<lower=max(1 / mu, 1 / (1 - mu))> kappa;

or if you want to be clever and avoid an unnecessary division,

real<lower=1 / min(mu, 1 - mu)> kappa;

You can further minimize arithmetic using short-circuiting

real<lower=1 / (mu < 0.5 ? mu : 1 - mu)> kappa;

The latter only evaluates 1 - mu when mu >= 0.5.

You’re going to find that this will struggle to sample when the data is consistent with kappa = 0.

2 Likes