Beta binomial model

I have a problem similar to the rat -tumour example. I am interested in getting the marginal posterior density of mu ( defined in the stan code, alpha/ (alpha+beta)). When I run the following stan code on my data, attached, I get the following graph, attached, where I compare the raw data with the sampled posterior marginal density. I have two questions:-

  1. Most of my raw data points agree very well with each other, but the sampled posterior density doesn’t seem right. The following model seems to work well for the examples where the raw data is not skewed towards the extreme. My initial guess was that the prior used for kappa might be influencing the posterior and driving it away from the data. But even on changing the hyper priors’ distribution completely ( from uniform to exponential) the results dont change. Could some one conceptually explain me what is going on ?
  2. What is the best way for me to choose the distribution for hyperpriors?


Following is my STAN code for the beta binomial model I am using.


data {
     int<lower=2> J;          // number of coins
     int<lower=0> y[J];       // heads in respective coin trials
     int<lower=0> n[J];       // total in respective coin trials
parameters {
     real<lower = 0, upper = 1> mu;
     real<lower = 0> kappa;
transformed parameters {
    real<lower=0> alpha;
    real<lower=0> beta;

    alpha = kappa * mu;
    beta = kappa - alpha;
model {
    mu ~ uniform(0, 1);
    kappa ~uniform(1,100);  \\ exponential(0.05) 
    y ~ beta_binomial(n, alpha, beta);

data-stan.csv (242 Bytes)

One thing that might be happening is that the Beta distribution is not flexible enough for your case (and this is just my guess) - maybe you cannot have a Beta distribution that has a lot of concentration very close to zero (to model most of your data) AND gives reasonable probability for the 3 data points that are further from 0.

This is generally a hard problem. You may follow the Stan wiki Prior Choice Recommendations · stan-dev/stan Wiki · GitHub but the best approach is to use prior predictive checks as described in: [1709.01449] Visualization in Bayesian workflow

Also note that you can use triple backtick (```) to mark starts and end to your code block to make your post look better.