Partially pooled beta binomial model

meera · April 8, 2019, 3:22am

Context for the model: I have total n_b experiments. For each experiment, the goal is to infer the true fraction of the column named y. For this I have replicate kind of data measurements for each experiment. However the number of replicates for each experiment are different, so we tried to pool the variance across experiments. We try to achieve this hierarchically by putting a gamma prior on the parameter for the distribution of dispersion kappa.

I have the following stan model.
Attached is also the snippet of how the input data looks. (

)

data {
     int<lower=0> N_ ;  //number of data points or number of rows in input dataset
     int<lower=0> n_b ; //number of unique experiments in dataset 
     int<lower=0, upper=n_b> condID[N_]; // vector that gives same identity to the measurements  of same experiment
     int<lower=0> y[N_];       // observed data
     int<lower=0> n[N_];       
}
parameters {
     vector<lower = 0, upper = 1>[n_b] mu;
     vector<lower = 0>[n_b] kappa;
     real<lower = 0> tau;
}
transformed parameters {
    vector<lower=0>[n_b] alpha;
    vector<lower=0>[n_b] beta;
    alpha= kappa .*  mu ;
    beta= kappa - alpha;
 }
model {
    tau ~ gamma(3,0.1);  // causes partial pooling of the dispersion 
    mu ~ uniform(0, 1);
    kappa ~ exponential(tau); 
    
    for(i in 1:N_){

      y[i] ~ beta_binomial(n[i], alpha[ condID[i] ], beta[ condID[i] ]) ;
      }
}

Here are my questions.
All the diagnostics ( divergence, tree depth, energy) look great. The Rhat and neff for the parameters I am interested in (all the mu’s) are also quite good. However the neff is too low for the tau and kappa parameters (

), even though rhats are acceptable.!

Does it affect the inferences made for the mu’s?
if so, how do I start troubleshooting ? Running the chains longer seems a bit of a problem because my input data is too large and a chain of 10000 iteration takes 7 hours.
It seems to me that the parameter distributions for kappa and tau are problematic? or the parameterization itself is a problem ?

Thanks!

bbbales2 · April 9, 2019, 8:13pm

Those kappas seem really large. Is that right? Check a pairplot of kappa and mu. I’m curious if those are correlated.

I’d parameterize this problem differently. Maybe use normal distributions as the hierarchical priors and then use a link function on a binomial.

Check out the example in here: Hierarchical Partial Pooling for Repeated Binary Trials

If you see divergences, you’ll want to non-center that. That link shows how.

The latest in Rhat is keep it < 1.01 ([1903.08008] Rank-normalization, folding, and localization: An improved $\widehat{R}$ for assessing convergence of MCMC – this Rhat is more conservative than the one currently implemented in Stan).

meera · April 11, 2019, 6:13pm

Thanks for your reply!

I would expect kappas to be large because I am fitting a large amount of data and pooling dispersion across this data. Here is the pairplot from the posterior draws. It looks okay?

I also implemented the non centered parametrization, described in the link, however the neff is still low for some of the parameters. Rhats do improve and model also runs faster with this reparametrization.

There were also no divergences with either model.

bbbales2 · April 11, 2019, 11:52pm

Nice, thanks. It’s good they don’t look super correlated, but kappa covers 6 orders of magnitude, and it isn’t parameterized on the log scale in the base model. That is gonna be really tough on the sampler.

The baseball data in that link is very similar to yours. That example models batting averages. You should try your data in that model. It’s a more common way of doing this sort of regression.

bbbales2 · April 11, 2019, 11:53pm

(and I think coding the model that way might speed everything up and fix your low N_eff/high Rhat problems)

meera · April 12, 2019, 2:21am

Thanks for your suggestions!

I was looking at the batting averages example and came across this statement. " Figure 5.3 from (Gelman et al. 2014) plots the fitted values for ϕ and κ on the unconstrained scale, which is the space over which Stan is sampling. The variable ϕ∈[0,1] is transformed to logit(ϕ)=log(ϕ/(1−ϕ))and κ is transformed to log⁡κ. We reproduce that figure here for our running example."
This statement was made for the model where Kappa wasnt parametrized according to the log scale. Is this a mistake? because I agree that kappa, in either model, isnt on the log scale. Or I am completely misunderstanding the statements.

Thanks again!

andre.pfeuffer · April 12, 2019, 3:03am

One comment:
E[tau] = 30
=> A-priori E[kappa] ~ 1/30

Your kappa is high, your prior is low. You could use inv_gamma instead of gamma.
Just my 2 cents.

meera · April 12, 2019, 3:16am

It was intentional to be more conservative in the priors. I do not know if that affects the sampler speed / chain mixing?

andre.pfeuffer · April 12, 2019, 3:29am

If your kappa is in the millions but you choose an average a-prior of 1/30 then the chances of sampling
reasonable values are (very) low. That means the model gets biased by the prior and it slows the sampling.
Hint: If you have a beta_binomial(a, b) with a=mu * phi, b= (1-mu) * phi and phi is low the variance is high and vice versa.

bbbales2 · April 12, 2019, 6:34pm

Oh wow, my apologies. I think I over exaggerated my statement before, “It’s a more common way of doing this sort of regression”. Clearly this isn’t so one sided if I link an article that does the other thing haha.

What I meant to reference was the model in the section titled: “Model 4: Partial Pooling (Log Odds)”, which is different.

For your model, you’d start with this:

data {
  int<lower=0> N_ ; //number of data points or number of rows in input dataset
  int<lower=0> y[N_]; // observed data
  int<lower=0> n[N_];
}

parameters {
  real mu;
  real<lower = 0> tau;
  vector[N_] alpha_z;
}

transformed parameters {
  vector[N_] alpha = alpha_z * tau + mu;
}

model {
  mu ~ normal(0, 1);
  tau ~ normal(0, 1);
  
  alpha_z ~ normal(0, 1);
  
  for(i in 1:N_){
    y[i] ~ binomial_logit(n[i], alpha[i]);
  }
}

generated quantities {
  vector[N_] p = inv_logit(alpha);
}

and then add some group level terms and whatnot. That’s a non-centered version of the model, so it might be a little weird to read (also, I didn’t actually run that and check it – could have bugs – I just made sure it compiled).

I looked this up in Gelman & Hill (http://www.stat.columbia.edu/~gelman/arm/), there were a couple pages on it: binomial_stuff.pdf (2.3 MB)

Topic		Replies	Views
Trying to use partial pooling in a Beta model Modeling fitting-issues , specification	3	466	September 13, 2021
Bias when using partial pooling model to estimate standard deviation? Modeling	1	470	June 8, 2018
Possible typo in Stan User's Guide in Chapter Multivariate priors for hierarchical models General	2	381	July 27, 2021
Beta binomial model Modeling	1	1833	March 18, 2019
Hierarchical partial pooling for repeated binary trials Modeling specification , rstanarm	2	500	December 15, 2020

Partially pooled beta binomial model

Related topics