Mean probability of success in hierarchical model for repeated binary trials

I have a question related to the following case study Hierarchical Partial Pooling for Repeated Binary Trials. I am trying to understand how different parameterizations for the chance of success (\theta_i) or the log-odds of success (\alpha_i) affect the posterior distribution for the mean probability of success.

I used the Stan code linked in this repository,, to fit 3 models to the rat tumor data referenced in the linked note at the top of my post. The three models were (1) a hierarchical model with a direct parameterization for the mean probability of success (hier.stan), (2) a hierarchical model with a log-odds probability of success and centered parameterization (hier-logit-centered.stan), and (3) a hierarchical model with a log-odds probability of success and non-centered parameterization (hier-logit.stan).

All the models have the same data block.

 data {
      int<lower=0> N;              // items
      int<lower=0> K[N];           // initial trials
      int<lower=0> y[N];           // initial successes

This is the parameters and model block for the model with mean (phi) and concentration (kappa) (hier.stan).

parameters {
  real<lower=0, upper=1> phi;         // population chance of success
  real<lower=1> kappa;                // population concentration
  vector<lower=0, upper=1>[N] theta;  // chance of success 
model {
  kappa ~ pareto(1, 1.5);                        // hyperprior
  theta ~ beta(phi * kappa, (1 - phi) * kappa);  // prior
  y ~ binomial(K, theta);                        // likelihood

This is the parameters and model block for the centered model with log-odds mean probability (mu) and log-odds population sd (sigma) (hier-logit.stan). The hyperparameters are similar in the non-centered model.

parameters {
  real mu;                       // population mean of success log-odds
  real<lower=0> sigma;           // population sd of success log-odds
  vector[N] alpha;               // success log-odds
model {
  mu ~ normal(-1, 1);               // hyperprior
  sigma ~ normal(0, 1);             // hyperprior
  alpha ~ normal(mu, sigma);        // prior (hierarchical)
  y ~ binomial_logit(K, alpha);     // likelihood

My question is about the population-level mean probability of success. I think this would be \phi in the hierarchical model parameterized by mean and concentration, and \mathrm{logit}^{-1}(\mu) in the centered and non-centered hierarchical models with the logit-link.

The figure below plots the posteriors, medians and 95% credible intervals for these parameters for the 3 models. Does anyone have suggestions for that might help me understand why I get different distributions and different medians and 95% credible intervals? Please let me know if I need to update this question with more information for it to be answered!

1 Like

Hey I’m sorry noone has answered yet. I think you’ve asked your questions well. Prima facie it looks to me like the logit and binomial models have different priors for their parameters. If I remember right, the rat tumor dataset is small and presumably the prior has considerable sway. You may be able to prove or disprove my guess by the constants for kappa and see how it effects the resultant distribution.

@Max_Mantei any other ideas?



Priors could be at play here, as @emiruz mentioned. The rats data you linked to actually had quite some cases, so I think the discrepancy due to priors is relatively small.

The real problem, however, is, that, while E[\theta^\text{Beta}] = \phi is true, E[\theta^\text{Logit}]\neq\text{logit}^{-1}(\mu), cf. Jensen’s inequality. I don’t have time to work this out mathematically, but if you want to do it, you’d have to look into change of variables stuff.

You can compare the inference of \theta^\text{Beta} and \theta^\text{Logit}, though. Just add:

transformed parameters{
  vector[N] theta = inv_logit(alpha);

to your logit-models and compare the thetas (their means, quantiles, etc.) directly. My guess would be that there is not a huge difference between those.

Hope this helps. :)


Thanks to both of you for your responses! I think the issues are those identified by @Max_Mantei. After making the comparisons (see below) it seems like the models are comparable. Thanks for helping with this!

The thetas seem pretty similar across the models. Comparisons for all three models are plotted below:

I also compared the posterior distributions of population probability of success. I added the following to my models,

For the hierarchical model parameterized with mean and concentration:

theta_pop = beta_rng(phi*kappa,(1-phi)*kappa);

For the centered and noncentered models with a logit-link, I added:

theta_pop = inv_logit(normal_rng(mu,sigma));

The plot below shows the posterior for \theta_pop. It looks similar for all three models, though there are still some differences between them:


Cool! Thanks for checking back and posting results! :)