Apparently equivalent noncentered parameterization, different outcomes

I’m having a funny issue with a hierarchical model I’m fitting in Rstan. It runs fine, and the posterior predictive checks all seem to check out. The model is a negative binomial regression with three crossed random effects. I’ve used noncentered parameterization to avoid the funnel issue. However, the model only runs successfully without divergent transitions if I specify redundant mean values for each of the random intercepts.

I think that the more “correct” approach is supposed to be something like this:

model {

for(a in 1:AA){
a1[a] = b0 + sigma_a * raw_a[a];
}

for(b in 1:BB){
b2[b] = sigma_b * raw_b[b];
}

for(c in 1:CC){
c2[c] = sigma_c * raw_c[c];
}

for(i in 1:N){
log_y_hat = a1[AA[i]] + b2[BB[i]] + c2[CC[i]] +  coef * x[i];
} 

y ~ neg_binomial_2_log(log_y_hat, phi);

b0 ~ normal(0,1);
raw_a ~ normal(0,1);
raw_b ~ normal(0,1);
raw_c ~ normal(0,1);

sigma_a~exponential(1);
sigma_b~exponential(1);
sigma_c~exponential(1);

phi ~ gamma(1,2);
coef ~ lognormal(-1, 1);

}

But the model will only fit without divergent transitions if I include a redundant mean value for each of the random intercepts (a_mu, b_mu, c_mu)

model {

for(a in 1:AA){
a1[a] = a_mu + sigma_a * raw_a[a];
}

for(b in 1:BB){
b2[b] = b_mu + sigma_b * raw_b[b];
}

for(c in 1:CC){
c2[c] = c_mu + sigma_c * raw_c[c];
}

for(i in 1:N){
log_y_hat = b0 + a1[AA[i]] + b2[BB[i]] + c2[CC[i]] +  coef * x[i];
} 

y ~ neg_binomial_2_log(log_y_hat, phi);

b0 ~ normal(0,1);
raw_a ~ normal(0,1);
raw_b ~ normal(0,1);
raw_c ~ normal(0,1);

a_mu ~ normal(0,1);
b_mu~normal(0,1);
c_mu~normal(0,1);

sigma_a~exponential(1);
sigma_b~exponential(1);
sigma_c~exponential(1);

phi ~ gamma(1,2);
coef ~ lognormal(-1, 1);

}

Have any of you run into this before? Or is there something I’m missing? I’d appreciate your thoughts!

Why do you feel that these parameters are redundant? In one case, you have a gaussian multiplied by an exponential, and, in the other, a mixture of a gaussian and a gaussian mutliplied by an exponential. Their geometries are not the same:

Hi Corey,
Thanks, I see what you mean. I was thinking in terms of the intercepts’ role in the model rather than their geometries. Since one intercept is already specified (b0), I thought that the random intercepts would only need to be included as some deviation from that mean value. I don’t think it’s a problem to have those three extra intercept terms, but I’m hoping to understand why they are necessary for the model fit. (Right now I think my response to any reviewer questions about it would be kind of lame.)