Beta_binomial fitting issues

stan1 · October 13, 2020, 8:10pm

Hi all,

I am having issues using beta_binomial. I started with trying to fit m12.1 in Statistical Rethinking (beta_binomial fit to classic UC Berkeley admissions data). The ulam generated stancode follows.

data{
    int N[12];
    int A[12];
    int gid[12];
}
parameters{
    vector[2] a;
    real<lower=0> phi;
}
transformed parameters{
    real theta;
    theta = phi + 2;
}
model{
    vector[12] pbar;
    phi ~ exponential( 1 );
    a ~ normal( 0 , 1.5 );
    for ( i in 1:12 ) {
        pbar[i] = a[gid[i]];
        pbar[i] = inv_logit(pbar[i]);
    }
    A ~ beta_binomial( N , pbar*theta , (1-pbar)*theta );
}

Warnings such as maximum treedepth, large R-hat, low ESS etc. are seen. I tried adjusting the priors with no luck.

Separating the beta_binomial into hierarchical pbar ~ beta(alpha,beta) and admits ~ binomial(apps, pbar) worked (code below). Moreover, the posterior samples (of a, phi, theta) had very similar summary statistics to the ones in Statistical Rethinking (R code 12.3 on pg. 372).

data{
  int N;
  int admits[N];
  int apps[N];
  int gid[N];
}

parameters{
  vector[2] a;
  real<lower=0> phi;
  vector<lower=0, upper=1>[N] pbar;
  
}

transformed parameters{
  vector[N] mu;
  vector[N] alpha;
  vector[N] beta;
  real<lower=2> theta;

  theta = phi + 2;
  for (i in 1:N){
    mu[i] = inv_logit(a[gid[i]]);
  }
  
  alpha = mu * theta;
  beta = (1-mu) * theta;
  

}

model{
  pbar ~ beta(alpha, beta);
  admits ~ binomial(apps, pbar);

  a ~ normal(0, 1.5);
  phi ~ exponential(1);
  
}

Should the separation into beta then binomial not be equivalent to using beta_binomial in one step? (some naming and other slight changes to practice my stan coding…). I also tried more or less exactly the above code but recombining into beta_binomial, which makes the beforeseen errors resurface.

data{
  int N;
  int admits[N];
  int apps[N];
  int gid[N];
}

parameters{
  vector[2] a;
  real<lower=0> phi;
  vector<lower=0, upper=1>[N] pbar;
  
}

transformed parameters{
  vector[N] mu;
  vector[N] alpha;
  vector[N] beta;
  real<lower=2> theta;

  theta = phi + 2;
  for (i in 1:N){
    mu[i] = inv_logit(a[gid[i]]);
  }
  
  alpha = mu * theta;
  beta = (1-mu) * theta;
  

}

model{

  
  admits ~ beta_binomial(apps, alpha, beta);

  a ~ normal(0, 1.5);
  phi ~ exponential(1);
  
}

Would appreciate any illumination into how to use beta_binomial better, or if my assumption of the equivalent model specifications is wrong. Thank you!

bbbales2 · October 15, 2020, 12:08pm

On the last model, if you’ve integrated out the betas again you don’t need pbar to be defined as a parameter. See if that helps. I agree it is weird the first model didn’t work, but I am curious to know if pbar is the problem with the third model first.

stan1 · October 15, 2020, 1:26pm

Ah yes I think that was an artifact of starting the second stan file via copy paste… just ran the following and the same issues persist

data{
  int N;
  int admits[N];
  int apps[N];
  int gid[N];
}

parameters{
  vector[2] a;
  real<lower=0> phi;

}

transformed parameters{
  vector[N] mu;
  vector[N] alpha;
  vector[N] beta;
  real<lower=2> theta;

  theta = phi + 2;
  for (i in 1:N){
    mu[i] = inv_logit(a[gid[i]]);
  }
  
  alpha = mu * theta;
  beta = (1-mu) * theta;
  

}

model{

  
  admits ~ beta_binomial(apps, alpha, beta);

  a ~ normal(0, 1.5);
  phi ~ exponential(1);
  
}

bbbales2 · October 15, 2020, 1:43pm

Can you post the data for this here or send me a link? This is strange. It seems like since this was the published example model it would work well and usually integrating out variables is good.

stan1 · October 15, 2020, 1:55pm

The .csv is:

“dept”,“applicant.gender”,“admit”,“reject”,“applications”
“A”,“male”,512,313,825
“A”,“female”,89,19,108
“B”,“male”,353,207,560
“B”,“female”,17,8,25
“C”,“male”,120,205,325
“C”,“female”,202,391,593
“D”,“male”,138,279,417
“D”,“female”,131,244,375
“E”,“male”,53,138,191
“E”,“female”,94,299,393
“F”,“male”,22,351,373
“F”,“female”,24,317,341

Using the following R code to fit:

library(rstan)
options(mc.cores = parallel::detectCores())
d <- read.csv(‘UCBadmit.csv’)
d$gid <- ifelse( d$applicant.gender==“male” , 1L , 2L )
data_list = list(N=12, admits=d$admit, apps=d$applications, gid = d$gid)
fit = stan(‘beta_bin2.stan’, data=data_list, chains=4, iter=1000)

bbbales2 · October 16, 2020, 2:25pm

There must be a problem with the beta_binomial in the Math library in the current Rstan (@bgoodri, so you’re aware). I get lots of treedepth exceeded warnings in 2.21.2.

I wasn’t able to find record of this issue anywhere, but in cmdstanr (which uses a later version of the Math library) this model runs fine. Can you use cmdstanr for now: https://mc-stan.org/cmdstanr/reference/index.html ?

The interface is a bit different:

library(cmdstanr)
model = cmdstan_model("beta_bin2.stan")
fit = model$sample(data = data_list, num_chains = 4, parallel_chains = 4)

stan1 · October 17, 2020, 4:23pm

great! thanks for looking into this. I will definitely try out cmdstanr

Topic		Replies	Views
Beta-Binomial model with rethinking package and dispersion for each interaction Modeling rethinking	2	620	March 16, 2023
Beta-binomial model Modeling	2	2532	February 10, 2019
I am having trouble with the custom family for beta binomial brms	6	749	April 30, 2020
Very different sampling with ~beta_binomial than with += beta_binomial_lpmf Developers rstan	5	910	November 30, 2020
Beta binomial model Modeling	1	1837	March 18, 2019

Beta_binomial fitting issues

Related topics