Beta binomial truncated takes 1000x more time than the non truncated version

Hello,

considering that in RStan ~ beta_binomial should not be used with the current release (Very different sampling with ~beta_binomial than with += beta_binomial_lpmf - #4 by jonah) I have implemented the truncated version as shown in the manual

y ~ poisson(3.7);
if (y < 2 || y > 10)
  target += negative_infinity();
else
  target += -log_sum_exp(poisson_lpmf(2 | 3.7),
                         log_diff_exp(poisson_lcdf(10 | 3.7),
                                      poisson_lcdf(2 | 3.7)));

My function is this

real beta_binomial_truncated_lpmf(int p, int exposure, real alpha, real beta, int lower, int upper){

		real lp = 0;

    lp += beta_binomial_lpmf(p | exposure, alpha, beta );
    if (p < lower || p > upper)
      lp += negative_infinity();
    else
      lp += -log_sum_exp(
        beta_binomial_lpmf(lower | exposure, alpha, beta),
        log_diff_exp(beta_binomial_lcdf(upper | exposure, alpha, beta),
        beta_binomial_lcdf(lower | exposure, alpha, beta))
      );

		return (lp);
	}

However, a simple model takes an unreasonable amount of time

1000 transitions using 10 leapfrog steps per transition would take 37900 seconds.

Does anyone know about any bug in any of those densities calculations? Or Am I implementing it wrong?

1 Like

If you rescale your outcome to [0,1], you could consider using my ordered beta regression. Might make sense if I’m interpreting your data correctly:

https://osf.io/preprints/socarxiv/2sx6y/

You can estimate it in brms.

2 Likes

Thanks, I am using the beta-binomial version because I want to see if keeping the integer data improves estimates.

The issue seems to me that you don’t use vectorisation. Maybe you can post a complete (simple) model to say a bit more?

1 Like

So here the two model and the R script to reproduce

  • no truncation 1.5 seconds
  • yes truncation 193.3 seconds

My suspicion is that beta_binomial_lcdf implementation is very slow (similarly to neg_binomial implementation)

(again ~beta_binomial is not working for (r)stan 2.21 so non vectorised version has been used here)

NO truncation

data{
	int N;
	int exposure[N];
	int y[N];
}
parameters{
	real<lower=0, upper=1> mu;
	real<upper=10> precision;
}
model{
  // ~ beta_binomial is buggy in rstan , fixed in 2.22
  target += beta_binomial_lpmf(y | exposure, mu * exp(precision), (1.0 - mu) * exp(precision) );

    precision ~ normal( 4, 2);
    mu ~ beta(2,2);
}

YES truncation

 functions {

  real beta_binomial_truncated_lpmf(int[] p, int[] exposure, real alpha, real beta, int[] lower, int[] upper){

    real lp = 0;

      lp += beta_binomial_lpmf(p | exposure, alpha, beta );
      lp += -log_sum_exp(
        beta_binomial_lpmf(lower | exposure, alpha, beta),
        log_diff_exp(beta_binomial_lcdf(upper | exposure, alpha, beta),
        beta_binomial_lcdf(lower | exposure, alpha, beta))
      );

		return (lp);
	}

}
data{
	int N;
	int exposure[N];
	int y[N];
	int truncation_up[N];
	int truncation_down[N];
}
parameters{
	real<lower=0, upper=1> mu;
	real<upper=10> precision;
}
model{
  // ~ beta_binomial is buggy in rstan , fixed in 2.22
  target += beta_binomial_truncated_lpmf(y | exposure, mu * exp(precision), (1.0 - mu) * exp(precision), truncation_down,  truncation_up);

    precision ~ normal( 4, 2);
    mu ~ beta(2,2);
}

R script

library(rstan)

N = 100
exposure = as.integer(runif(N, 10, 1000))
proportion = rbeta(N, 0.2 * 100, 0.8 * 100)
y = as.integer(exposure * proportion)
truncation_up = y * 2
truncation_down = as.integer(y / 2)

mod1 = stan_model(file = "dev/reproducible_example_beta_binomial_truncated/beta_binomial.stan")

fit1 = sampling(mod1, cores=4, data = list(
  N = N,
  y = y,
  exposure = exposure
))

mod2 = stan_model(file = "dev/reproducible_example_beta_binomial_truncated/beta_binomial_truncation.stan")

fit2 = sampling(mod2, cores=4, data = list(
  N = N,
  y = y,
  exposure = exposure,
  truncation_up = truncation_up,
  truncation_down = truncation_down
))

Then use the new profiler we have in cmdstan 2.26 to confirm your suspicion maybe?

I would definitely try with newer Stan - note that you can get rstan 2.26 easily via Repository for distributing (some) stan-dev R packages | r-packages .

Unfortunately, changes in total runtime are somewhat unreliable to determine implementation performance, as they will likely be dominated by any changes in how easy it is to explore the distribution, particularly to any changes to actual treedepth achieved. So if treedepth increased substantially, the problem is likely primarily in the geometry.

Comparing the time per gradient evaluation as reported at the start of sampling (as you did) is definitely more useful than the total runtime. One could also probably compute the time per gradient evaluation (for each iteration, the gradient is evaluated 2^treedepth times) to get a less noisy estimate, but if the change in the time reported at start of sampling is very big, then I agree it is likely that the actual computation plays a role in the difference.

The code looks good. One thing that puzzles me about the truncation for discrete distributions as described in the manual is why you need

 target += -log_sum_exp(dist_lpmf(lower | pars),
                         log_diff_exp(dist_lcdf(upper | pars),
                                      dist_lcdf(lower | pars)));

instead of just:

 target += - log_diff_exp(dist_lcdf(upper | pars),
                          dist_lcdf(lower - 1 | pars));

Maybe simply because that would trigger an error if lower == 0? Please double check that my math is correct, but if you never need lower == 0 this might be a way to save a few CPU cycles (although I would expect that the _lcdf calls are actually the most expensive). EDIT: Looking at the code, the beta_binomial_lcdf seems to have almost the same number of expensive function evalutaions as beta_binomial_lpmf so my expectation is probably wrong).

Hope that helps at least a bit!

3 Likes

That’s amazing. Although I am very tempted, as for all the models I do they become R tools and packages. So I cannot expect the users to install the non-CRAN rstan :(

Unless of course I can add the non-standard repository https://mc-stan.org/r-packages/ in the DESCRIPTION file of my R package. I’ll have to investigate.

I am not sure if I follow. Basically in tests, both models result in a well behaved posterior. However, the truncated model is 100X slower. Unfortunately, I did not thorough benchmarking, but clearly, the only difference is the use of lcdf functions.

As I am using this for modelling outlier elimination, I will try to use truncated distribution for the components where I eliminate data, and used non truncated for the rest (very model specific sorry, I have many beta ninomials, some of them include outliers).

It does as always ;)

Oh, I think I just noticed an important bug:

You need to iterate over the observations and compute the _lcdf and log_sum_exp separately for each observations. The _lpmf and _lcdf calls vectorize by returning a sum of all the log-masses, but you want to log_sum_exp first for each observation and then sum the results of the log_sum_exp.

And as you have the loop, you can easily branch to include the truncation only when needed, which also let’s you handle the lower = 0 and upper = exposure[i] case separately and thus have the somewhat more efficient code which avoids some of the _lpmf calls (code not tested):

real beta_binomial_truncated_lpmf(int[] p, int[] exposure, real alpha, real beta, int[] lower, int[] upper){

    real lp = 0;

      // Vectorizes safely
      lp += beta_binomial_lpmf(p | exposure, alpha, beta );
      for(i in 1:num_elements(p)) {
        if(lower[i] == 0) {
          if(upper[i] != exposure[i]) {
             // Upper only
             lp += -beta_binomial_lcdf(upper[i] | exposure[i], alpha, beta)
          }
        } else {
          if(upper[i] != exposure[i]) {
             //Both
             lp += -log_diff_exp(
                      beta_binomial_lcdf(upper[i] | exposure[i], alpha, beta),
                      beta_binomial_lcdf(lower[i] - 1 | exposure[i], alpha, beta));
          } else {
             // Lower only
             lp += -beta_binomial_lccdf(lower[i] - 1 | exposure[i], alpha, beta)
          }
        }
      }
      return (lp);
}
3 Likes

Thanks, Martin!
You have a good eye.

Thankfully following your other suggestion :) I have updated to rstan 2.26 and I was able to use the form T[…, …] and the whole problem went away, and the truncated version is not much slower than the standard one.

1 Like

I’m sorry to say that the I overlooked at the timing. When using beta_binomial() T[x, y] I get a incredible increase in time with rstan 2.26.

Sorry for the hype :) I think it just takes very long even in the native implementation.

Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 37596.5 seconds.