Beta_binomial fitting issues

Hi all,

I am having issues using beta_binomial. I started with trying to fit m12.1 in Statistical Rethinking (beta_binomial fit to classic UC Berkeley admissions data). The ulam generated stancode follows.

data{
    int N[12];
    int A[12];
    int gid[12];
}
parameters{
    vector[2] a;
    real<lower=0> phi;
}
transformed parameters{
    real theta;
    theta = phi + 2;
}
model{
    vector[12] pbar;
    phi ~ exponential( 1 );
    a ~ normal( 0 , 1.5 );
    for ( i in 1:12 ) {
        pbar[i] = a[gid[i]];
        pbar[i] = inv_logit(pbar[i]);
    }
    A ~ beta_binomial( N , pbar*theta , (1-pbar)*theta );
}

Warnings such as maximum treedepth, large R-hat, low ESS etc. are seen. I tried adjusting the priors with no luck.

Separating the beta_binomial into hierarchical pbar ~ beta(alpha,beta) and admits ~ binomial(apps, pbar) worked (code below). Moreover, the posterior samples (of a, phi, theta) had very similar summary statistics to the ones in Statistical Rethinking (R code 12.3 on pg. 372).

data{
  int N;
  int admits[N];
  int apps[N];
  int gid[N];
}

parameters{
  vector[2] a;
  real<lower=0> phi;
  vector<lower=0, upper=1>[N] pbar;
  
}

transformed parameters{
  vector[N] mu;
  vector[N] alpha;
  vector[N] beta;
  real<lower=2> theta;

  theta = phi + 2;
  for (i in 1:N){
    mu[i] = inv_logit(a[gid[i]]);
  }
  
  alpha = mu * theta;
  beta = (1-mu) * theta;
  

}

model{
  pbar ~ beta(alpha, beta);
  admits ~ binomial(apps, pbar);

  a ~ normal(0, 1.5);
  phi ~ exponential(1);
  
}

Should the separation into beta then binomial not be equivalent to using beta_binomial in one step? (some naming and other slight changes to practice my stan codingā€¦). I also tried more or less exactly the above code but recombining into beta_binomial, which makes the beforeseen errors resurface.

data{
  int N;
  int admits[N];
  int apps[N];
  int gid[N];
}

parameters{
  vector[2] a;
  real<lower=0> phi;
  vector<lower=0, upper=1>[N] pbar;
  
}

transformed parameters{
  vector[N] mu;
  vector[N] alpha;
  vector[N] beta;
  real<lower=2> theta;

  theta = phi + 2;
  for (i in 1:N){
    mu[i] = inv_logit(a[gid[i]]);
  }
  
  alpha = mu * theta;
  beta = (1-mu) * theta;
  

}

model{

  
  admits ~ beta_binomial(apps, alpha, beta);

  a ~ normal(0, 1.5);
  phi ~ exponential(1);
  
}

Would appreciate any illumination into how to use beta_binomial better, or if my assumption of the equivalent model specifications is wrong. Thank you!

On the last model, if youā€™ve integrated out the betas again you donā€™t need pbar to be defined as a parameter. See if that helps. I agree it is weird the first model didnā€™t work, but I am curious to know if pbar is the problem with the third model first.

Ah yes I think that was an artifact of starting the second stan file via copy pasteā€¦ just ran the following and the same issues persist

data{
  int N;
  int admits[N];
  int apps[N];
  int gid[N];
}

parameters{
  vector[2] a;
  real<lower=0> phi;

}

transformed parameters{
  vector[N] mu;
  vector[N] alpha;
  vector[N] beta;
  real<lower=2> theta;

  theta = phi + 2;
  for (i in 1:N){
    mu[i] = inv_logit(a[gid[i]]);
  }
  
  alpha = mu * theta;
  beta = (1-mu) * theta;
  

}

model{

  
  admits ~ beta_binomial(apps, alpha, beta);

  a ~ normal(0, 1.5);
  phi ~ exponential(1);
  
}

Can you post the data for this here or send me a link? This is strange. It seems like since this was the published example model it would work well and usually integrating out variables is good.

The .csv is:

ā€œdeptā€,ā€œapplicant.genderā€,ā€œadmitā€,ā€œrejectā€,ā€œapplicationsā€
ā€œAā€,ā€œmaleā€,512,313,825
ā€œAā€,ā€œfemaleā€,89,19,108
ā€œBā€,ā€œmaleā€,353,207,560
ā€œBā€,ā€œfemaleā€,17,8,25
ā€œCā€,ā€œmaleā€,120,205,325
ā€œCā€,ā€œfemaleā€,202,391,593
ā€œDā€,ā€œmaleā€,138,279,417
ā€œDā€,ā€œfemaleā€,131,244,375
ā€œEā€,ā€œmaleā€,53,138,191
ā€œEā€,ā€œfemaleā€,94,299,393
ā€œFā€,ā€œmaleā€,22,351,373
ā€œFā€,ā€œfemaleā€,24,317,341

Using the following R code to fit:

library(rstan)
options(mc.cores = parallel::detectCores())
d <- read.csv(ā€˜UCBadmit.csvā€™)
d$gid <- ifelse( d$applicant.gender==ā€œmaleā€ , 1L , 2L )
data_list = list(N=12, admits=d$admit, apps=d$applications, gid = d$gid)
fit = stan(ā€˜beta_bin2.stanā€™, data=data_list, chains=4, iter=1000)

1 Like

There must be a problem with the beta_binomial in the Math library in the current Rstan (@bgoodri, so youā€™re aware). I get lots of treedepth exceeded warnings in 2.21.2.

I wasnā€™t able to find record of this issue anywhere, but in cmdstanr (which uses a later version of the Math library) this model runs fine. Can you use cmdstanr for now: https://mc-stan.org/cmdstanr/reference/index.html ?

The interface is a bit different:

library(cmdstanr)
model = cmdstan_model("beta_bin2.stan")
fit = model$sample(data = data_list, num_chains = 4, parallel_chains = 4)

great! thanks for looking into this. I will definitely try out cmdstanr

1 Like