Continuous bernoulli

@StaffanBetner I was playing around with the continuous bernoulli and if p is a vector I needed to divide the normalizing constant by the sample size.

The stan code. Note that in Stan 2.29 you can declare the same function name with different inputs, one is a vector of p and the other is a real p. Does this seem right?

 real continuous_bernoulli_lpdf(vector y, vector lambda) {
   int N = num_elements(y);
   real lp = N * log2();
   real lp_c = 0;
   int counter = 0;
   
   for (n in 1:N) {
    lp += y[n] * log(lambda[n]) + (1 - y[n]) * log1m(lambda[n]); 
      if (lambda[n] != 0.5) {
        counter += 1;
        lp_c += log( atanh(1 - 2 * lambda[n]) / (1 - 2 * lambda[n]) );
      }
   }
    return lp + lp_c / counter;
 } 
  
  real continuous_bernoulli_lpdf(vector y, real lambda) {
   int N = num_elements(y);
   real lp = N * log2() + sum(y * log(lambda) + (1 - y) * log1m(lambda));
   
    if (lambda != 0.5)
      lp += log( atanh(1 - 2 * lambda) / (1 - 2 * lambda) );
      
    return lp;
 } 
4 Likes

Hi @spinkey - did you ever figure out which of these is better?

Hi - check out this blog post. I added the dist to brms:

Hi @saudiwin neat post! I have a residual confusion though: you mention that in your ordered beta paper you found that the fractional logit gave wildly varying performance and you didn’t recommend it. In the blog post, you suggest that the distribution was fixed by normalizing it. But including or excluding the normalization term should have no effect on the Stan model. Is the implication that the normalized model (i.e. the “continuous bernoulli”) still should perform badly in simulation (or at least in the simulation that you carried out)?

This is not normalization in terms of the denominator in Bayes’ formula and Stan sampling, but rather of the fractional logit function to make it integrate to 1. You can fit the “fractional logit” model in Stan and I did it in my sim, but you can’t simulate from it as it has no CDF b/c it doesn’t integrate to 1. If you plug it into wolfram alpha and integrate it, it will pop back out the formula that the authors use as a “normalizing constant.” I.e., it makes the function integrate to 1.

You can simulate from continuous bernoulli and I could have/should have included it in the sim I ran, but I didn’t know it existed at the time. The first paper only came out in 2019.

It seems at first blush that the continuous bernoulli is a simplified parameterization vis-a-vis beta, but I have some doubts as I find the discontinuity rather strange. Plus the beta distribution has lots of established properties and continuous bernoulli very few. But it’s clear that continuous bernoulli does “work” and has fewer parameters than beta.

Sorry if I’m being dense. Can we not drop the normalization term as we do with stan functions ending in the lupdf suffix?

I mean yes for computational convenience, but you still need it at the end of sampling. Been a while since I looked at the specifics.

This is different though — it’s not about computational challenges but rather whether/how to normalise to make a proper PDF. Computationally it’s straightforward to do.

From the stan manual:

The built in distribution functions in Stan are all available in normalized and unnormalized form. The normalized forms include all of the terms in the log density, and the unnormalized forms drop terms which are not directly or indirectly a function of the model parameters.

So you can’t do that with the continuous Bernoulli because there is only one parameter and the normalizing constant is a function of that parameter. So there’s no way to drop it, sample, then renormalize–as far as I can tell.

You can fit the conventional fractional logit (without normalizing constant) in Stan so long as you have priors on coefficients, but you can’t convert it back to continuous Bernoulli - re: the linked paper proofs (which seem correct).

Oh I see! Sorry; was being dense! When I saw “normalizing constant” I assumed that it was constant in the parameters.

No it’s not, which is a misnomer in the cited paper. Normalizing function would be more correct. It’s only constant for a given value of the parameter, which of course isn’t constant in any meaningful sense.