Beta_binomial fitting issues

Hi all,

I am having issues using beta_binomial. I started with trying to fit m12.1 in Statistical Rethinking (beta_binomial fit to classic UC Berkeley admissions data). The ulam generated stancode follows.

data{
    int N[12];
    int A[12];
    int gid[12];
}
parameters{
    vector[2] a;
    real<lower=0> phi;
}
transformed parameters{
    real theta;
    theta = phi + 2;
}
model{
    vector[12] pbar;
    phi ~ exponential( 1 );
    a ~ normal( 0 , 1.5 );
    for ( i in 1:12 ) {
        pbar[i] = a[gid[i]];
        pbar[i] = inv_logit(pbar[i]);
    }
    A ~ beta_binomial( N , pbar*theta , (1-pbar)*theta );
}

Warnings such as maximum treedepth, large R-hat, low ESS etc. are seen. I tried adjusting the priors with no luck.

Separating the beta_binomial into hierarchical pbar ~ beta(alpha,beta) and admits ~ binomial(apps, pbar) worked (code below). Moreover, the posterior samples (of a, phi, theta) had very similar summary statistics to the ones in Statistical Rethinking (R code 12.3 on pg. 372).

data{
  int N;
  int admits[N];
  int apps[N];
  int gid[N];
}

parameters{
  vector[2] a;
  real<lower=0> phi;
  vector<lower=0, upper=1>[N] pbar;
  
}

transformed parameters{
  vector[N] mu;
  vector[N] alpha;
  vector[N] beta;
  real<lower=2> theta;

  theta = phi + 2;
  for (i in 1:N){
    mu[i] = inv_logit(a[gid[i]]);
  }
  
  alpha = mu * theta;
  beta = (1-mu) * theta;
  

}

model{
  pbar ~ beta(alpha, beta);
  admits ~ binomial(apps, pbar);

  a ~ normal(0, 1.5);
  phi ~ exponential(1);
  
}

Should the separation into beta then binomial not be equivalent to using beta_binomial in one step? (some naming and other slight changes to practice my stan coding…). I also tried more or less exactly the above code but recombining into beta_binomial, which makes the beforeseen errors resurface.

data{
  int N;
  int admits[N];
  int apps[N];
  int gid[N];
}

parameters{
  vector[2] a;
  real<lower=0> phi;
  vector<lower=0, upper=1>[N] pbar;
  
}

transformed parameters{
  vector[N] mu;
  vector[N] alpha;
  vector[N] beta;
  real<lower=2> theta;

  theta = phi + 2;
  for (i in 1:N){
    mu[i] = inv_logit(a[gid[i]]);
  }
  
  alpha = mu * theta;
  beta = (1-mu) * theta;
  

}

model{

  
  admits ~ beta_binomial(apps, alpha, beta);

  a ~ normal(0, 1.5);
  phi ~ exponential(1);
  
}

Would appreciate any illumination into how to use beta_binomial better, or if my assumption of the equivalent model specifications is wrong. Thank you!

On the last model, if you’ve integrated out the betas again you don’t need pbar to be defined as a parameter. See if that helps. I agree it is weird the first model didn’t work, but I am curious to know if pbar is the problem with the third model first.

Ah yes I think that was an artifact of starting the second stan file via copy paste… just ran the following and the same issues persist

data{
  int N;
  int admits[N];
  int apps[N];
  int gid[N];
}

parameters{
  vector[2] a;
  real<lower=0> phi;

}

transformed parameters{
  vector[N] mu;
  vector[N] alpha;
  vector[N] beta;
  real<lower=2> theta;

  theta = phi + 2;
  for (i in 1:N){
    mu[i] = inv_logit(a[gid[i]]);
  }
  
  alpha = mu * theta;
  beta = (1-mu) * theta;
  

}

model{

  
  admits ~ beta_binomial(apps, alpha, beta);

  a ~ normal(0, 1.5);
  phi ~ exponential(1);
  
}

Can you post the data for this here or send me a link? This is strange. It seems like since this was the published example model it would work well and usually integrating out variables is good.

The .csv is:

ā€œdeptā€,ā€œapplicant.genderā€,ā€œadmitā€,ā€œrejectā€,ā€œapplicationsā€
ā€œAā€,ā€œmaleā€,512,313,825
ā€œAā€,ā€œfemaleā€,89,19,108
ā€œBā€,ā€œmaleā€,353,207,560
ā€œBā€,ā€œfemaleā€,17,8,25
ā€œCā€,ā€œmaleā€,120,205,325
ā€œCā€,ā€œfemaleā€,202,391,593
ā€œDā€,ā€œmaleā€,138,279,417
ā€œDā€,ā€œfemaleā€,131,244,375
ā€œEā€,ā€œmaleā€,53,138,191
ā€œEā€,ā€œfemaleā€,94,299,393
ā€œFā€,ā€œmaleā€,22,351,373
ā€œFā€,ā€œfemaleā€,24,317,341

Using the following R code to fit:

library(rstan)
options(mc.cores = parallel::detectCores())
d <- read.csv(ā€˜UCBadmit.csv’)
d$gid <- ifelse( d$applicant.gender==ā€œmaleā€ , 1L , 2L )
data_list = list(N=12, admits=d$admit, apps=d$applications, gid = d$gid)
fit = stan(ā€˜beta_bin2.stan’, data=data_list, chains=4, iter=1000)

1 Like

There must be a problem with the beta_binomial in the Math library in the current Rstan (@bgoodri, so you’re aware). I get lots of treedepth exceeded warnings in 2.21.2.

I wasn’t able to find record of this issue anywhere, but in cmdstanr (which uses a later version of the Math library) this model runs fine. Can you use cmdstanr for now: https://mc-stan.org/cmdstanr/reference/index.html ?

The interface is a bit different:

library(cmdstanr)
model = cmdstan_model("beta_bin2.stan")
fit = model$sample(data = data_list, num_chains = 4, parallel_chains = 4)

great! thanks for looking into this. I will definitely try out cmdstanr

1 Like