I’ve been working a lot with beta regression lately, and I am having some issues using probabilities that are censored by floating point precision, i.e. very close to 0/1. It strikes me that the best approach to dealing with this problem is to have a beta distribution on log probabilities, rather than the (0,1) interval. Any kind of rounding can introduce some pretty big biases as the extreme values will have an outsize effect on estimates (see my paper here). In principle it doesn’t seem to hard to have a Beta distribution on \text{log } x rather than x \in (0,1), but I can’t find any examples of this in the literature. If anyone knows of one, or can see a way of straightforwardly calculating the beta distribution on \text{log } x, I’d appreciate any tips/pointers.

This new paper seems like one plausible option… a “beta prime” distribution on \frac{x}{1-x} that includes a regression version to predict the mean of \frac{x}{1-x}.

Based on the paper above, we can also model a beta regression using the following log-likelihood function for a variable x \in (0,1), where y = \frac{x}{1-x}, \mu > 0, \phi >0.

real beta_prime_lpdf(real y, real mu, real phi) {
// adapted from Bourguignon et al. (2021)
// doi: https://doi.org/10.1007/s40300-021-00203-y
return((mu*(1 + phi) - 1)*log(y) - (mu*(1 + phi) + phi + 2)*log1p(y) -
lgamma(mu*(1 + phi)) - lgamma(phi + 2) +
lgamma(mu*(1+ phi) + phi + 2));
}

It is then straightforward to translate model predictions back to probabilities by using the inverse logit function as it’s essentially modeling the log-odds. The author of the paper has a helpful R package BPmodel that implements distribution functions.

I’m putting this into ordbetareg, but I think it could be of wider use in the Stan community given the issues with overflow/underflow in the beta_proportion_lpdf function.

I am guessing putting it into @spinkney’s GitHub - spinkney/helpful_stan_functions repository would be great for starters as well (before we embark on adding it in Stan Math). What do you think Sean?

The one caveat with the above approach is that it can’t estimate all the possible shapes of the Beta distribution. The \alpha and \beta parameters are restricted to be at least +1 and +2 respectively. Apparently that’s because the mean and variance of the beta prime distribution aren’t defined otherwise.

It does work pretty well, though, given that caveat.

Given \theta \sim \textrm{beta}(\alpha, \beta), then we can let \phi = \log \theta or \psi = \log(\theta / (1 - \theta)) and work out p(\phi \mid \alpha, \beta) and p(\psi \mid \alpha, \beta) in the usual way. So let’s consider \phi, for which the change-of-variables formula yields

If \alpha and \beta are unknowns, then you need to evaluate the \Gamma(\alpha + 1), \Gamma(\beta + 1) and \Gamma(\alpha + \beta + 1), which can also be problematic.

and the main risk if you don’t have the normalizer is that 1 - \theta rounds to 1.

Now if you’re using beta_lpdf rather than beta_lupdf or the sampling statement, you can also run into problems evaluating the beta function normalizer, which I’ve just rendered as const here.