Problems remain after non-centered parametrization of correlated parameters

tlyim · June 19, 2019, 4:28pm

Thanks for the suggestions in your reply to my another post. I am working on the plotting as you suggest.

Meanwhile, I have a look at the effect of changing the max_depth and realize that setting it to the 9 or 10 that I had been using leads to 600 or 121 transitions hitting the limit, resp. (when running 3 x 200 = 600 transitions). Increasing to max_depth=11 or above avoids hitting the limit completely.

The max_depth=9 case also has the issue of split R-hat greater than 1.1: gw_sd[3], gw_sd[5]. But all the parameters of interest (reported below) have R-hat ~ 1.0 and the run time is significantly shorter than in the max_depth=10 case (where no R-hat > 1.1 at all).
- Should R-hat > 1.1 be avoided at all cost even when (i) the problem does not affect the good Rhat of the parameters of interest; (ii) their mean estimates remain very similar; (iii) ignoring the issue gives a much shorter run time ?
I have looked at max_depth=11 to 13 as well. The warmup time increases quite substantially even though the final sampling time is very similar and so are the mean estimates, N_Eff, and Rhat of the parameters of interest.
- Is it still worth trying much bigger max_depth, like 20, 30, …?
Had a look at the CmdStan manual concerning init_buffer, term_buffer, and window, and the adaptation process. I guess I have a basic understanding. Tried to look around (like the Stan user guide, forum, …) and found only very limited related info (such as this post). Another discussion by @aaronjg sounds very relevant but most of the responses in that post are at a level too technical for me to comprehend.
- My take from that post is that I can reduce the overall warmup time by forcing more iterations into the fast init_buffer and term_buffer intervals (eg, 200, instead of the default 75). Am I getting this right? Is it useful to adjust the stepsize at the same time and if so, how?

Would appreciate very much your feedback. Many thanks.

====================================================

max_depth=9:

600 of 600 (1e+02%) transitions hit the maximum treedepth limit of 9, or 2^9 leapfrog steps. 
Trajectories that are prematurely terminated due to this limit will result in slow exploration 
and you should increase the limit to ensure optimal performance.

The following parameters had split R-hat greater than 1.1:
  gw_sd[3], gw_sd[5]
Such high values indicate incomplete mixing and biasedestimation.  
You should consider regularization your model with additional prior information 
or looking for a more effective parameterization.
 
3 chains: each with iter=(200,200,200); warmup=(0,0,0); thin=(1,1,1); 600 iterations saved.
Warmup took (11398, 11641, 10726) seconds, 9.4 hours total
Sampling took (2992, 2994, 3009) seconds, 2.5 hours total
                       Mean     MCSE   StdDev        5%       50%       95%    N_Eff  N_Eff/s    R_hat
lp__                1.9e+04  1.2e+00  1.3e+01   1.9e+04   1.9e+04   1.9e+04  1.2e+02  1.3e-02  1.0e+00
accept_stat__       9.9e-01  1.6e-03  2.4e-02   9.5e-01   1.0e+00   1.0e+00  2.3e+02  2.5e-02  1.0e+00
stepsize__          1.4e-02  1.7e-03  2.1e-03   1.1e-02   1.4e-02   1.6e-02  1.5e+00  1.7e-04  8.0e+14
treedepth__         9.0e+00     -nan  1.4e-14   9.0e+00   9.0e+00   9.0e+00     -nan     -nan  9.9e-01
n_leapfrog__        5.1e+02     -nan  2.1e-12   5.1e+02   5.1e+02   5.1e+02     -nan     -nan     -nan
divergent__         0.0e+00     -nan  0.0e+00   0.0e+00   0.0e+00   0.0e+00     -nan     -nan     -nan
energy__           -1.8e+04  1.5e+00  1.9e+01  -1.8e+04  -1.8e+04  -1.8e+04  1.5e+02  1.7e-02  1.0e+00
sd_y                8.1e-02  2.0e-05  5.9e-04   8.0e-02   8.1e-02   8.2e-02  8.9e+02  9.9e-02  1.0e+00
mu_u1               8.1e-02  3.2e-04  9.2e-03   6.5e-02   8.1e-02   9.5e-02  8.4e+02  9.4e-02  1.0e+00
mu_alpha            4.3e-02  7.6e-05  1.9e-03   4.0e-02   4.3e-02   4.6e-02  6.5e+02  7.2e-02  1.0e+00
beta                5.8e-01  2.8e-04  8.1e-03   5.7e-01   5.8e-01   6.0e-01  8.0e+02  8.9e-02  1.0e+00
theta               1.6e-01  1.1e-04  3.4e-03   1.5e-01   1.6e-01   1.6e-01  9.6e+02  1.1e-01  1.0e+00
sd_season           9.9e-02  1.7e-04  4.4e-03   9.1e-02   9.8e-02   1.1e-01  6.8e+02  7.5e-02  1.0e+00
mu_season[1]       -1.2e-01  3.6e-04  1.1e-02  -1.4e-01  -1.2e-01  -1.0e-01  8.5e+02  9.4e-02  1.0e+00
mu_season[2]       -6.9e-02  4.1e-04  1.0e-02  -8.6e-02  -7.0e-02  -5.2e-02  6.1e+02  6.8e-02  1.0e+00
mu_season[3]        1.4e-01  4.0e-04  1.0e-02   1.2e-01   1.4e-01   1.6e-01  6.3e+02  7.0e-02  1.0e+00
p[1]                7.0e-01  2.4e-03  5.2e-02   6.3e-01   6.9e-01   7.9e-01  4.8e+02  5.3e-02  1.0e+00
p[2]                6.2e-01  2.5e-04  6.0e-03   6.1e-01   6.2e-01   6.3e-01  5.8e+02  6.5e-02  1.0e+00
p[3]                6.8e-01  3.4e-03  7.4e-02   5.8e-01   6.6e-01   8.2e-01  4.7e+02  5.3e-02  1.0e+00
g[1]                8.5e-01  1.8e-03  4.8e-02   7.8e-01   8.6e-01   9.3e-01  6.8e+02  7.6e-02  1.0e+00
g[2]                3.3e-01  8.1e-04  2.0e-02   2.9e-01   3.3e-01   3.6e-01  6.4e+02  7.1e-02  1.0e+00
w[1]                6.4e-01  3.6e-03  6.9e-02   5.3e-01   6.3e-01   7.5e-01  3.7e+02  4.1e-02  1.0e+00
w[2]                1.5e-01  6.0e-04  1.3e-02   1.3e-01   1.5e-01   1.8e-01  4.7e+02  5.2e-02  1.0e+00
w[3]                5.8e-01  8.5e-04  2.0e-02   5.5e-01   5.8e-01   6.2e-01  5.8e+02  6.4e-02  1.0e+00
d[1]                4.3e-02  5.5e-04  1.1e-02   2.6e-02   4.3e-02   6.0e-02  3.8e+02  4.2e-02  1.0e+00
d[2]                7.1e-01  3.7e-04  9.5e-03   6.9e-01   7.1e-01   7.2e-01  6.8e+02  7.5e-02  1.0e+00
d[3]                2.5e-01  4.6e-04  1.1e-02   2.3e-01   2.5e-01   2.7e-01  5.9e+02  6.6e-02  1.0e+00

max_depth=10:

121 of 600 (20%) transitions hit the maximum treedepth limit of 10, or 2^10 leapfrog steps. 
Trajectories that are prematurely terminated due to this limit will result in slow exploration 
and you should increase the limit to ensure optimal performance.

3 chains: each with iter=(200,200,200); warmup=(0,0,0); thin=(1,1,1); 600 iterations saved.
Warmup took (21050, 20714, 20333) seconds, 17 hours total
Sampling took (3058, 5142, 4021) seconds, 3.4 hours total
                       Mean     MCSE   StdDev        5%       50%       95%    N_Eff  N_Eff/s    R_hat
lp__                1.9e+04  9.5e-01  1.4e+01   1.9e+04   1.9e+04   1.9e+04  2.1e+02  1.7e-02  1.0e+00
accept_stat__       9.8e-01  1.5e-03  3.7e-02   9.4e-01   1.0e+00   1.0e+00  5.9e+02  4.9e-02  1.0e+00
stepsize__          1.4e-02  1.4e-03  1.7e-03   1.1e-02   1.4e-02   1.5e-02  1.5e+00  1.2e-04  2.6e+14
treedepth__         9.2e+00     -nan  4.0e-01   9.0e+00   9.0e+00   1.0e+01     -nan     -nan  1.2e+00
n_leapfrog__        6.9e+02  1.2e+02  2.4e+02   5.1e+02   5.1e+02   1.0e+03  3.9e+00  3.2e-04  1.3e+00
divergent__         0.0e+00     -nan  0.0e+00   0.0e+00   0.0e+00   0.0e+00     -nan     -nan     -nan
energy__           -1.8e+04  1.3e+00  1.9e+01  -1.8e+04  -1.8e+04  -1.8e+04  2.0e+02  1.6e-02  1.0e+00
sd_y                8.1e-02  2.2e-05  6.2e-04   8.0e-02   8.1e-02   8.2e-02  8.1e+02  6.6e-02  1.0e+00
mu_u1               8.0e-02  3.6e-04  9.8e-03   6.4e-02   8.1e-02   9.6e-02  7.6e+02  6.2e-02  1.0e+00
mu_alpha            4.3e-02  7.5e-05  1.9e-03   4.0e-02   4.3e-02   4.6e-02  6.3e+02  5.1e-02  1.0e+00
beta                5.9e-01  2.9e-04  7.8e-03   5.7e-01   5.8e-01   6.0e-01  7.0e+02  5.7e-02  1.0e+00
theta               1.6e-01  1.3e-04  3.3e-03   1.5e-01   1.6e-01   1.6e-01  6.7e+02  5.5e-02  1.0e+00
sd_season           9.9e-02  1.5e-04  4.5e-03   9.1e-02   9.9e-02   1.1e-01  8.5e+02  7.0e-02  1.0e+00
mu_season[1]       -1.2e-01  3.2e-04  1.0e-02  -1.4e-01  -1.2e-01  -1.0e-01  1.0e+03  8.3e-02  1.0e+00
mu_season[2]       -6.9e-02  3.9e-04  1.1e-02  -8.7e-02  -6.9e-02  -5.1e-02  7.4e+02  6.0e-02  1.0e+00
mu_season[3]        1.4e-01  3.8e-04  1.0e-02   1.3e-01   1.4e-01   1.6e-01  7.2e+02  5.9e-02  1.0e+00
p[1]                7.1e-01  2.5e-03  6.0e-02   6.3e-01   7.0e-01   8.2e-01  5.8e+02  4.8e-02  1.0e+00
p[2]                6.2e-01  2.2e-04  5.7e-03   6.1e-01   6.2e-01   6.3e-01  6.6e+02  5.4e-02  1.0e+00
p[3]                6.9e-01  3.5e-03  8.6e-02   5.9e-01   6.8e-01   8.5e-01  5.9e+02  4.9e-02  1.0e+00
g[1]                8.5e-01  1.6e-03  4.9e-02   7.7e-01   8.5e-01   9.3e-01  9.4e+02  7.7e-02  1.0e+00
g[2]                3.3e-01  8.1e-04  2.0e-02   3.0e-01   3.3e-01   3.6e-01  6.4e+02  5.3e-02  1.0e+00
w[1]                6.4e-01  2.8e-03  6.4e-02   5.3e-01   6.4e-01   7.4e-01  5.3e+02  4.4e-02  1.0e+00
w[2]                1.5e-01  4.7e-04  1.3e-02   1.3e-01   1.5e-01   1.7e-01  7.0e+02  5.7e-02  1.0e+00
w[3]                5.8e-01  7.5e-04  2.1e-02   5.5e-01   5.8e-01   6.2e-01  8.2e+02  6.7e-02  1.0e+00
d[1]                4.4e-02  4.1e-04  1.1e-02   2.6e-02   4.4e-02   6.2e-02  6.9e+02  5.7e-02  1.0e+00
d[2]                7.1e-01  3.6e-04  9.9e-03   6.9e-01   7.1e-01   7.2e-01  7.4e+02  6.1e-02  1.0e+00
d[3]                2.5e-01  5.1e-04  1.1e-02   2.3e-01   2.5e-01   2.7e-01  4.8e+02  4.0e-02  1.0e+00

max_depth=11:

3 chains: each with iter=(200,200,200); warmup=(0,0,0); thin=(1,1,1); 600 iterations saved.
Warmup took (28354, 29356, 33217) seconds, 25 hours total
Sampling took (3291, 5990, 5615) seconds, 4.1 hours total
                       Mean     MCSE   StdDev        5%       50%       95%    N_Eff  N_Eff/s    R_hat
lp__                1.9e+04  9.4e-01  1.4e+01   1.9e+04   1.9e+04   1.9e+04  2.1e+02  1.4e-02  1.0e+00
accept_stat__       9.9e-01  7.4e-04  1.8e-02   9.6e-01   1.0e+00   1.0e+00  6.1e+02  4.1e-02  1.0e+00
stepsize__          1.1e-02  1.5e-03  1.9e-03   9.0e-03   1.0e-02   1.3e-02  1.5e+00  1.0e-04  3.5e+14
treedepth__         9.6e+00  3.5e-01  4.9e-01   9.0e+00   1.0e+01   1.0e+01  2.0e+00  1.4e-04  1.9e+00
n_leapfrog__        8.4e+02  1.7e+02  2.4e+02   5.1e+02   1.0e+03   1.0e+03  2.0e+00  1.3e-04  2.0e+00
divergent__         0.0e+00     -nan  0.0e+00   0.0e+00   0.0e+00   0.0e+00     -nan     -nan     -nan
energy__           -1.8e+04  1.4e+00  1.9e+01  -1.8e+04  -1.8e+04  -1.8e+04  1.9e+02  1.3e-02  1.0e+00
sd_y                8.1e-02  2.3e-05  6.2e-04   8.0e-02   8.1e-02   8.2e-02  7.1e+02  4.8e-02  1.0e+00
mu_u1               8.1e-02  3.8e-04  9.5e-03   6.5e-02   8.1e-02   9.6e-02  6.1e+02  4.1e-02  1.0e+00
mu_alpha            4.3e-02  8.0e-05  2.0e-03   4.0e-02   4.3e-02   4.6e-02  6.3e+02  4.2e-02  1.0e+00
beta                5.8e-01  3.1e-04  7.6e-03   5.7e-01   5.8e-01   6.0e-01  6.2e+02  4.2e-02  1.0e+00
theta               1.6e-01  1.3e-04  3.3e-03   1.5e-01   1.6e-01   1.6e-01  6.6e+02  4.4e-02  1.0e+00
sd_season           9.9e-02  1.6e-04  4.5e-03   9.2e-02   9.8e-02   1.1e-01  7.5e+02  5.0e-02  1.0e+00
mu_season[1]       -1.2e-01  3.6e-04  1.1e-02  -1.4e-01  -1.2e-01  -1.0e-01  8.5e+02  5.7e-02  1.0e+00
mu_season[2]       -6.9e-02  4.0e-04  1.0e-02  -8.6e-02  -6.9e-02  -5.2e-02  6.9e+02  4.6e-02  1.0e+00
mu_season[3]        1.4e-01  3.8e-04  1.1e-02   1.3e-01   1.4e-01   1.6e-01  8.1e+02  5.4e-02  1.0e+00
p[1]                7.0e-01  2.5e-03  5.6e-02   6.3e-01   7.0e-01   8.1e-01  5.1e+02  3.4e-02  1.0e+00
p[2]                6.2e-01  2.1e-04  5.5e-03   6.1e-01   6.2e-01   6.2e-01  6.8e+02  4.6e-02  1.0e+00
p[3]                6.9e-01  3.6e-03  8.0e-02   5.8e-01   6.8e-01   8.4e-01  5.0e+02  3.4e-02  1.0e+00
g[1]                8.6e-01  1.7e-03  4.6e-02   7.8e-01   8.5e-01   9.3e-01  7.4e+02  5.0e-02  1.0e+00
g[2]                3.3e-01  7.0e-04  2.1e-02   2.9e-01   3.3e-01   3.6e-01  9.4e+02  6.3e-02  1.0e+00
w[1]                6.4e-01  2.5e-03  6.6e-02   5.4e-01   6.4e-01   7.5e-01  7.1e+02  4.8e-02  1.0e+00
w[2]                1.5e-01  4.5e-04  1.2e-02   1.3e-01   1.5e-01   1.7e-01  7.3e+02  4.9e-02  1.0e+00
w[3]                5.8e-01  7.5e-04  2.0e-02   5.5e-01   5.8e-01   6.2e-01  7.1e+02  4.7e-02  1.0e+00
d[1]                4.4e-02  4.4e-04  1.1e-02   2.5e-02   4.4e-02   6.3e-02  6.6e+02  4.4e-02  1.0e+00
d[2]                7.1e-01  3.7e-04  9.9e-03   6.9e-01   7.1e-01   7.3e-01  7.3e+02  4.9e-02  1.0e+00
d[3]                2.5e-01  5.0e-04  1.2e-02   2.3e-01   2.5e-01   2.7e-01  5.8e+02  3.9e-02  1.0e+00

Topic		Replies	Views
Question on mutlivariate non-centered parametrizations Modeling rstan , techniques , fitting-issues , reparametrization	2	666	June 17, 2022
Non centered parameterization on variance parameter Modeling	31	8483	October 22, 2018
Simple Non-Centered Time Series? Modeling	7	1185	November 29, 2017
Problems with non-centered variance parameters (Matt tick) Modeling performance	0	341	January 6, 2019
Non-centred parametrization of covariance matrix causes many divergent transitions and slows down sampling Modeling specification , performance , cmdstanr , reparametrization	7	659	September 7, 2021

Problems remain after non-centered parametrization of correlated parameters

Related topics