Partially pooled beta binomial model

Context for the model: I have total n_b experiments. For each experiment, the goal is to infer the true fraction of the column named y. For this I have replicate kind of data measurements for each experiment. However the number of replicates for each experiment are different, so we tried to pool the variance across experiments. We try to achieve this hierarchically by putting a gamma prior on the parameter for the distribution of dispersion kappa.

I have the following stan model.
Attached is also the snippet of how the input data looks. (


data {
     int<lower=0> N_ ;  //number of data points or number of rows in input dataset
     int<lower=0> n_b ; //number of unique experiments in dataset 
     int<lower=0, upper=n_b> condID[N_]; // vector that gives same identity to the measurements  of same experiment
     int<lower=0> y[N_];       // observed data
     int<lower=0> n[N_];       
parameters {
     vector<lower = 0, upper = 1>[n_b] mu;
     vector<lower = 0>[n_b] kappa;
     real<lower = 0> tau;
transformed parameters {
    vector<lower=0>[n_b] alpha;
    vector<lower=0>[n_b] beta;
    alpha= kappa .*  mu ;
    beta= kappa - alpha;
model {
    tau ~ gamma(3,0.1);  // causes partial pooling of the dispersion 
    mu ~ uniform(0, 1);
    kappa ~ exponential(tau); 
    for(i in 1:N_){

      y[i] ~ beta_binomial(n[i], alpha[ condID[i] ], beta[ condID[i] ]) ;

Here are my questions.
All the diagnostics ( divergence, tree depth, energy) look great. The Rhat and neff for the parameters I am interested in (all the mu’s) are also quite good. However the neff is too low for the tau and kappa parameters (

), even though rhats are acceptable.!

  1. Does it affect the inferences made for the mu’s?

  2. if so, how do I start troubleshooting ? Running the chains longer seems a bit of a problem because my input data is too large and a chain of 10000 iteration takes 7 hours.
    It seems to me that the parameter distributions for kappa and tau are problematic? or the parameterization itself is a problem ?


Those kappas seem really large. Is that right? Check a pairplot of kappa and mu. I’m curious if those are correlated.

I’d parameterize this problem differently. Maybe use normal distributions as the hierarchical priors and then use a link function on a binomial.

Check out the example in here: Hierarchical Partial Pooling for Repeated Binary Trials

If you see divergences, you’ll want to non-center that. That link shows how.

The latest in Rhat is keep it < 1.01 ([1903.08008] Rank-normalization, folding, and localization: An improved $\widehat{R}$ for assessing convergence of MCMC – this Rhat is more conservative than the one currently implemented in Stan).

Thanks for your reply!

I would expect kappas to be large because I am fitting a large amount of data and pooling dispersion across this data. Here is the pairplot from the posterior draws. It looks okay? image

I also implemented the non centered parametrization, described in the link, however the neff is still low for some of the parameters. Rhats do improve and model also runs faster with this reparametrization.

There were also no divergences with either model.

Nice, thanks. It’s good they don’t look super correlated, but kappa covers 6 orders of magnitude, and it isn’t parameterized on the log scale in the base model. That is gonna be really tough on the sampler.

The baseball data in that link is very similar to yours. That example models batting averages. You should try your data in that model. It’s a more common way of doing this sort of regression.

(and I think coding the model that way might speed everything up and fix your low N_eff/high Rhat problems)

Thanks for your suggestions!

I was looking at the batting averages example and came across this statement. " Figure 5.3 from (Gelman et al. 2014) plots the fitted values for ϕ and κ on the unconstrained scale, which is the space over which Stan is sampling. The variable ϕ∈[0,1] is transformed to logit(ϕ)=log(ϕ/(1−ϕ))and κ is transformed to log⁡κ. We reproduce that figure here for our running example."
This statement was made for the model where Kappa wasnt parametrized according to the log scale. Is this a mistake? because I agree that kappa, in either model, isnt on the log scale. Or I am completely misunderstanding the statements.

Thanks again!

One comment:
E[tau] = 30
=> A-priori E[kappa] ~ 1/30

Your kappa is high, your prior is low. You could use inv_gamma instead of gamma.
Just my 2 cents.

It was intentional to be more conservative in the priors. I do not know if that affects the sampler speed / chain mixing?

If your kappa is in the millions but you choose an average a-prior of 1/30 then the chances of sampling
reasonable values are (very) low. That means the model gets biased by the prior and it slows the sampling.
Hint: If you have a beta_binomial(a, b) with a=mu * phi, b= (1-mu) * phi and phi is low the variance is high and vice versa.

Oh wow, my apologies. I think I over exaggerated my statement before, “It’s a more common way of doing this sort of regression”. Clearly this isn’t so one sided if I link an article that does the other thing haha.

What I meant to reference was the model in the section titled: “Model 4: Partial Pooling (Log Odds)”, which is different.

For your model, you’d start with this:

data {
  int<lower=0> N_ ; //number of data points or number of rows in input dataset
  int<lower=0> y[N_]; // observed data
  int<lower=0> n[N_];

parameters {
  real mu;
  real<lower = 0> tau;
  vector[N_] alpha_z;

transformed parameters {
  vector[N_] alpha = alpha_z * tau + mu;

model {
  mu ~ normal(0, 1);
  tau ~ normal(0, 1);
  alpha_z ~ normal(0, 1);
  for(i in 1:N_){
    y[i] ~ binomial_logit(n[i], alpha[i]);

generated quantities {
  vector[N_] p = inv_logit(alpha);

and then add some group level terms and whatnot. That’s a non-centered version of the model, so it might be a little weird to read (also, I didn’t actually run that and check it – could have bugs – I just made sure it compiled).

I looked this up in Gelman & Hill (Home page for the book, "Data Analysis Using Regression and Multilevel/Hierarchical Models"), there were a couple pages on it: binomial_stuff.pdf (2.3 MB)