Context for the model: I have total n_b experiments. For each experiment, the goal is to infer the true fraction of the column named y. For this I have replicate kind of data measurements for each experiment. However the number of replicates for each experiment are different, so we tried to pool the variance across experiments. We try to achieve this hierarchically by putting a gamma prior on the parameter for the distribution of dispersion kappa.
I have the following stan model.
Attached is also the snippet of how the input data looks. (
data {
int<lower=0> N_ ; //number of data points or number of rows in input dataset
int<lower=0> n_b ; //number of unique experiments in dataset
int<lower=0, upper=n_b> condID[N_]; // vector that gives same identity to the measurements of same experiment
int<lower=0> y[N_]; // observed data
int<lower=0> n[N_];
}
parameters {
vector<lower = 0, upper = 1>[n_b] mu;
vector<lower = 0>[n_b] kappa;
real<lower = 0> tau;
}
transformed parameters {
vector<lower=0>[n_b] alpha;
vector<lower=0>[n_b] beta;
alpha= kappa .* mu ;
beta= kappa - alpha;
}
model {
tau ~ gamma(3,0.1); // causes partial pooling of the dispersion
mu ~ uniform(0, 1);
kappa ~ exponential(tau);
for(i in 1:N_){
y[i] ~ beta_binomial(n[i], alpha[ condID[i] ], beta[ condID[i] ]) ;
}
}
Here are my questions.
All the diagnostics ( divergence, tree depth, energy) look great. The Rhat and neff for the parameters I am interested in (all the mu’s) are also quite good. However the neff is too low for the tau and kappa parameters (
if so, how do I start troubleshooting ? Running the chains longer seems a bit of a problem because my input data is too large and a chain of 10000 iteration takes 7 hours.
It seems to me that the parameter distributions for kappa and tau are problematic? or the parameterization itself is a problem ?
I would expect kappas to be large because I am fitting a large amount of data and pooling dispersion across this data. Here is the pairplot from the posterior draws. It looks okay?
I also implemented the non centered parametrization, described in the link, however the neff is still low for some of the parameters. Rhats do improve and model also runs faster with this reparametrization.
Nice, thanks. It’s good they don’t look super correlated, but kappa covers 6 orders of magnitude, and it isn’t parameterized on the log scale in the base model. That is gonna be really tough on the sampler.
The baseball data in that link is very similar to yours. That example models batting averages. You should try your data in that model. It’s a more common way of doing this sort of regression.
I was looking at the batting averages example and came across this statement. " Figure 5.3 from (Gelman et al. 2014) plots the fitted values for ϕ and κ on the unconstrained scale, which is the space over which Stan is sampling. The variable ϕ∈[0,1] is transformed to logit(ϕ)=log(ϕ/(1−ϕ))and κ is transformed to logκ. We reproduce that figure here for our running example."
This statement was made for the model where Kappa wasnt parametrized according to the log scale. Is this a mistake? because I agree that kappa, in either model, isnt on the log scale. Or I am completely misunderstanding the statements.
If your kappa is in the millions but you choose an average a-prior of 1/30 then the chances of sampling
reasonable values are (very) low. That means the model gets biased by the prior and it slows the sampling.
Hint: If you have a beta_binomial(a, b) with a=mu * phi, b= (1-mu) * phi and phi is low the variance is high and vice versa.
Oh wow, my apologies. I think I over exaggerated my statement before, “It’s a more common way of doing this sort of regression”. Clearly this isn’t so one sided if I link an article that does the other thing haha.
What I meant to reference was the model in the section titled: “Model 4: Partial Pooling (Log Odds)”, which is different.
For your model, you’d start with this:
data {
int<lower=0> N_ ; //number of data points or number of rows in input dataset
int<lower=0> y[N_]; // observed data
int<lower=0> n[N_];
}
parameters {
real mu;
real<lower = 0> tau;
vector[N_] alpha_z;
}
transformed parameters {
vector[N_] alpha = alpha_z * tau + mu;
}
model {
mu ~ normal(0, 1);
tau ~ normal(0, 1);
alpha_z ~ normal(0, 1);
for(i in 1:N_){
y[i] ~ binomial_logit(n[i], alpha[i]);
}
}
generated quantities {
vector[N_] p = inv_logit(alpha);
}
and then add some group level terms and whatnot. That’s a non-centered version of the model, so it might be a little weird to read (also, I didn’t actually run that and check it – could have bugs – I just made sure it compiled).