Context for the model: I have total n_b experiments. For each experiment, the goal is to infer the true fraction of the column named y. For this I have replicate kind of data measurements for each experiment. However the number of replicates for each experiment are different, so we tried to pool the variance across experiments. We try to achieve this hierarchically by putting a gamma prior on the parameter for the distribution of dispersion kappa.
I have the following stan model.
Attached is also the snippet of how the input data looks. (
“”"
data {
int<lower=0> N_ ; //number of data points or number of rows in input dataset
int<lower=0> n_b ; //number of unique experiments in dataset
int<lower=0, upper=n_b> condID[N_]; // vector that gives same identity to the measurements of same experiment
int<lower=0> y[N_]; // observed data
int<lower=0> n[N_];
}
parameters {
vector<lower = 0, upper = 1>[n_b] mu;
vector<lower = 0>[n_b] kappa;
real<lower = 0> tau;
}
transformed parameters {
vector<lower=0>[n_b] alpha;
vector<lower=0>[n_b] beta;
alpha= kappa .* mu ;
beta= kappa  alpha;
}
model {
tau ~ gamma(3,0.1); // causes partial pooling of the dispersion
mu ~ uniform(0, 1);
kappa ~ exponential(tau);
for(i in 1:N_){
y[i] ~ beta_binomial(n[i], alpha[ condID[i] ], beta[ condID[i] ]) ;
}
}
“”"
Here are my questions.
All the diagnostics ( divergence, tree depth, energy) look great. The Rhat and neff for the parameters I am interested in (all the mu’s) are also quite good. However the neff is too low for the tau and kappa parameters (

Does it affect the inferences made for the mu’s?

if so, how do I start troubleshooting ? Running the chains longer seems a bit of a problem because my input data is too large and a chain of 10000 iteration takes 7 hours.
It seems to me that the parameter distributions for kappa and tau are problematic? or the parameterization itself is a problem ?
Thanks!