Apparently equivalent noncentered parameterization, different outcomes

ashmadoo · August 7, 2024, 7:30pm

I’m having a funny issue with a hierarchical model I’m fitting in Rstan. It runs fine, and the posterior predictive checks all seem to check out. The model is a negative binomial regression with three crossed random effects. I’ve used noncentered parameterization to avoid the funnel issue. However, the model only runs successfully without divergent transitions if I specify redundant mean values for each of the random intercepts.

I think that the more “correct” approach is supposed to be something like this:

model {

for(a in 1:AA){
a1[a] = b0 + sigma_a * raw_a[a];
}

for(b in 1:BB){
b2[b] = sigma_b * raw_b[b];
}

for(c in 1:CC){
c2[c] = sigma_c * raw_c[c];
}

for(i in 1:N){
log_y_hat = a1[AA[i]] + b2[BB[i]] + c2[CC[i]] +  coef * x[i];
} 

y ~ neg_binomial_2_log(log_y_hat, phi);

b0 ~ normal(0,1);
raw_a ~ normal(0,1);
raw_b ~ normal(0,1);
raw_c ~ normal(0,1);

sigma_a~exponential(1);
sigma_b~exponential(1);
sigma_c~exponential(1);

phi ~ gamma(1,2);
coef ~ lognormal(-1, 1);

}

But the model will only fit without divergent transitions if I include a redundant mean value for each of the random intercepts (a_mu, b_mu, c_mu)

model {

for(a in 1:AA){
a1[a] = a_mu + sigma_a * raw_a[a];
}

for(b in 1:BB){
b2[b] = b_mu + sigma_b * raw_b[b];
}

for(c in 1:CC){
c2[c] = c_mu + sigma_c * raw_c[c];
}

for(i in 1:N){
log_y_hat = b0 + a1[AA[i]] + b2[BB[i]] + c2[CC[i]] +  coef * x[i];
} 

y ~ neg_binomial_2_log(log_y_hat, phi);

b0 ~ normal(0,1);
raw_a ~ normal(0,1);
raw_b ~ normal(0,1);
raw_c ~ normal(0,1);

a_mu ~ normal(0,1);
b_mu~normal(0,1);
c_mu~normal(0,1);

sigma_a~exponential(1);
sigma_b~exponential(1);
sigma_c~exponential(1);

phi ~ gamma(1,2);
coef ~ lognormal(-1, 1);

}

Have any of you run into this before? Or is there something I’m missing? I’d appreciate your thoughts!

Corey.Plate · August 7, 2024, 8:06pm

Why do you feel that these parameters are redundant? In one case, you have a gaussian multiplied by an exponential, and, in the other, a mixture of a gaussian and a gaussian mutliplied by an exponential. Their geometries are not the same:

ashmadoo · August 8, 2024, 12:50pm

Hi Corey,
Thanks, I see what you mean. I was thinking in terms of the intercepts’ role in the model rather than their geometries. Since one intercept is already specified (b0), I thought that the random intercepts would only need to be included as some deviation from that mean value. I don’t think it’s a problem to have those three extra intercept terms, but I’m hoping to understand why they are necessary for the model fit. (Right now I think my response to any reviewer questions about it would be kind of lame.)

Topic		Replies	Views
Non-centered parameterization of a hierarchical negative binomial model in Stan Modeling rstan , specification	2	651	September 7, 2021
Non Centered Parameterization in GP Model Modeling	18	2175	June 20, 2017
Divergent transitions with multilevel model with 3 groups rstanarm	4	1555	April 14, 2018
How to Reduce Divergent Transitions for Non-linear Hierarchical Model? Modeling rstan , techniques , fitting-issues , divergences	4	979	October 22, 2020
Choosing correct non-centered parametrization Modeling techniques , specification	9	777	October 2, 2020

Apparently equivalent noncentered parameterization, different outcomes

Related topics