Yeah if the only difference between your two models is one is non-centered and one is centered, and the centered one works better with big data and the non-centered with small data, this happens.
There’s two different problems that happen. Here’s code to plot the problem and examine it on the eight schools dataset:
eight_schools_centered.stan (334 Bytes) eight_schools_noncentered.stan (357 Bytes)
library(rstan)
data_low = list(J = 8,
y = c(28, 8, -3, 7, -1, 1, 18, 12),
sigma = c(15, 10, 16, 11, 9, 11, 10, 18))
# really small measurement errors (small sigma) is like having lots of data
data_high = list(J = 8,
y = c(28, 8, -3, 7, -1, 1, 18, 12),
sigma = 0.01 * c(15, 10, 16, 11, 9, 11, 10, 18))
cmod = stan_model("eight_schools_centered.stan")
ncmod = stan_model("eight_schools_noncentered.stan")
fit_clow = sampling(cmod, data_low) # centered with low data, divergences
fit_chigh = sampling(cmod, data_high) # centered with lots of data, good
fit_nclow = sampling(ncmod, data_low) # non-centered low data, good
fit_nchigh = sampling(ncmod, data_high) # non-centered high data, treedepth problems
# plot log_tau instead of tau because log_tau is what the sampler sees
pairs(fit_clow, pars = c("theta[1]", "theta[3]", "mu", "log_tau"))
pairs(fit_chigh, pars = c("theta[1]", "theta[3]", "mu", "log_tau"))
# In the non-centered case the sampler works in z not in theta
pairs(fit_nclow, pars = c("z[1]", "z[3]", "mu", "log_tau"))
pairs(fit_nchigh, pars = c("z[1]", "z[3]", "mu", "log_tau"))