One MCMC chain not moving

wzhong · October 3, 2017, 10:17pm

Hi,

I am fitting a complicated hierarchical reinforcement learning model, and I am having some problems with model diagnostics. Specifically, when I examine the trace plots for the group hyperparameters, one of the MCMC chains seems to be stationary for most variables (except for mu_par[5]) while the other chains mix reasonably well (see below).

And here are the pairs plot, unfortunately the matrix is too large to fit both mu and sigma’s in the same plot…

I placed the following priors:

mu_par[1:8] ~ normal(0,1);  
sigma[1:8] ~ normal(0,1);
	
mu_par[9:10] ~ normal(0,2);  
sigma[9:10] ~  cauchy(0,2);

And I ran the model with step size = 0.001 and adapt delta = 0.999. I also noticed that when I ran the model, one chain took less than 10 minutes to complete while the three other chains took between half-day to one day. I am wondering if it is the fast chain that is having some problem?
I would be grateful if anyone can suggest what steps I should take to resolve this problem and what may be the cause of this issue. Thanks!

bgoodri · October 4, 2017, 1:35am

Quite likely. You can look at the list generated by get_sampler_params(). Presumably one chain adapted very differently than the others.

wzhong · October 4, 2017, 4:29pm

I ran the get_sampler_params() on the fit object and here are the outputs. It seems the pathological chain 3 has very small step size and very high energy.

  [[1]]
          accept_stat__   stepsize__ treedepth__ n_leapfrog__ divergent__     energy__
     [1,]             1 7.450581e-12           2            3           0 1.306063e+19
     [2,]             0 1.001820e-02           0            1           1 1.499753e+18
     [3,]             0 9.514925e-04           0            1           1 1.499753e+18
     [4,]             0 4.885943e-05           0            1           1 1.499753e+18
     [5,]             0 1.916193e-06           0            1           1 1.499753e+18
     [6,]             0 6.717539e-08           0            1           1 1.499753e+18
     [7,]             0 2.286806e-09           0            1           1 1.499753e+18
     [8,]             0 7.918191e-11           0            1           1 1.499753e+18
     [9,]             1 2.863701e-12           4           15           0 1.492220e+18
    [10,]             1 2.583148e-12           7          127           0 5.344301e+17
    [11,]             1 2.513133e-12           1            1           0 2.224569e+17
    [12,]             1 2.585101e-12           2            3           0 2.215456e+17
    [13,]             1 2.771415e-12           3            7           0 2.177481e+17
    [14,]             1 3.064166e-12           1            1           0 2.016586e+17
    [15,]             1 3.466581e-12           1            1           0 2.005466e+17
    [16,]             1 3.989317e-12           1            1           0 1.991385e+17
    [17,]             1 4.648851e-12           1            1           0 1.972991e+17
    [18,]             1 5.466865e-12           1            1           0 1.948461e+17
    [19,]             1 6.470127e-12           2            3           0 1.914051e+17
    [20,]             1 7.690664e-12           1            1           0 1.745322e+17
    [21,]             1 9.166097e-12           2            3           0 1.689292e+17
    [22,]             1 1.094009e-11           4           15           0 1.426225e+17
    [23,]             1 1.306289e-11           2            3           0 4.811179e+16
    [24,]             1 1.559188e-11           1            1           0 4.346739e+16
    [25,]             1 1.859224e-11           1            1           0 4.201955e+16
    [26,]             1 2.213764e-11           1            1           0 4.009627e+16
    [27,]             1 2.631089e-11           1            1           0 3.761618e+16
    [28,]             1 3.120472e-11           1            1           0 3.453972e+16
    [29,]             1 3.692251e-11           2            3           0 3.057181e+16
    [30,]             1 4.357908e-11           1            1           0 1.845041e+16

    [[2]]
          accept_stat__   stepsize__ treedepth__ n_leapfrog__ divergent__  energy__
     [1,]     0.0000000 0.0080000000           0            1           1 220443.72
     [2,]     1.0000000 0.0016261601           3            7           0 214171.69
     [3,]     1.0000000 0.0009514925           3            7           0 116666.99
     [4,]     1.0000000 0.0007017947           5           31           0  78637.18
     [5,]     1.0000000 0.0005809276           4           15           0  33377.93
     [6,]     1.0000000 0.0005148232           4           15           0  31327.58
     [7,]     1.0000000 0.0004766819           4           15           0  27454.79
     [8,]     1.0000000 0.0004546284           4           15           0  25573.87
     [9,]     1.0000000 0.0004426525           5           31           0  25263.59
    [10,]     1.0000000 0.0004374081           5           31           0  24022.68
    [11,]     0.9999997 0.0004368918           5           31           0  23087.26
    [12,]     1.0000000 0.0004398329           5           31           0  20927.98
    [13,]     1.0000000 0.0004453883           6           63           0  19680.25
    [14,]     0.9998862 0.0004529745           6           63           0  18248.85
    [15,]     0.9999979 0.0004620113           7          127           0  16891.33
    [16,]     0.9992086 0.0004725150           8          255           0  14261.29
    [17,]     0.9914343 0.0004829221          11         2047           0  12756.67
    [18,]     0.9992218 0.0004825939          12         4095           0  11649.08
    [19,]     0.9862694 0.0004944094          13         8191           0  11048.24
    [20,]     0.9954384 0.0004874687          13         8191           0  10491.81
    [21,]     0.9990575 0.0004944489          13         8191           0  10249.38
    [22,]     0.9948989 0.0005071214          13         8191           0  10220.08
    [23,]     0.9974357 0.0005138497          13         8191           0  10171.28
    [24,]     0.9962128 0.0005246038          13         8191           0  10163.10
    [25,]     0.9955914 0.0005336683          13         8191           0  10138.83
    [26,]     0.9997636 0.0005418895          13         8191           0  10128.69
    [27,]     0.9999868 0.0005567270          13         8191           0  10086.87
    [28,]     0.9997677 0.0005721352          13         8191           0  10104.65
    [29,]     0.9973125 0.0005873873          13         8191           0  10112.70
    [30,]     0.9972249 0.0005987351          13         8191           0  10106.31

    [[3]]
          accept_stat__   stepsize__ treedepth__ n_leapfrog__ divergent__     energy__
     [1,]             1 1.292470e-29           1            1           0 5.797633e+54
     [2,]             0 1.001820e-02           0            1           1 2.297321e+54
     [3,]             0 9.514925e-04           0            1           1 2.297321e+54
     [4,]             0 4.885943e-05           0            1           1 2.297321e+54
     [5,]             0 1.916193e-06           0            1           1 2.297321e+54
     [6,]             0 6.717539e-08           0            1           1 2.297321e+54
     [7,]             0 2.286806e-09           0            1           1 2.297321e+54
     [8,]             0 7.918191e-11           0            1           1 2.297321e+54
     [9,]             0 2.863701e-12           0            1           1 2.297321e+54
    [10,]             0 1.098229e-13           0            1           1 2.297321e+54
    [11,]             0 4.502938e-15           0            1           1 2.297321e+54
    [12,]             0 1.981857e-16           0            1           1 2.297321e+54
    [13,]             0 9.375834e-18           0            1           1 2.297321e+54
    [14,]             0 4.766257e-19           0            1           1 2.297321e+54
    [15,]             0 2.600133e-20           0            1           1 2.297321e+54
    [16,]             0 1.519193e-21           0            1           1 2.297321e+54
    [17,]             0 9.484620e-23           0            1           1 2.297321e+54
    [18,]             0 6.311388e-24           0            1           1 2.297321e+54
    [19,]             0 4.464805e-25           0            1           1 2.297321e+54
    [20,]             0 3.349092e-26           0            1           1 2.297321e+54
    [21,]             0 2.657007e-27           0            1           1 2.297321e+54
    [22,]             0 2.223961e-28           0            1           1 2.297321e+54
    [23,]             1 1.959293e-29           1            1           0 1.973148e+54
    [24,]             1 3.316111e-29           2            3           0 8.024776e+53
    [25,]             1 5.596571e-29           1            1           0 2.762145e+53
    [26,]             1 9.404855e-29           1            1           0 9.471475e+52
    [27,]             1 1.571930e-28           1            1           0 3.281367e+52
    [28,]             1 2.610904e-28           2            3           0 1.150290e+52
    [29,]             1 4.306664e-28           1            1           0 4.086772e+51
    [30,]             1 7.051335e-28           7          127           0 1.473304e+51

    [[4]]
          accept_stat__   stepsize__ treedepth__ n_leapfrog__ divergent__     energy__
     [1,]             1 1.907349e-09           1            1           0 4.367587e+14
     [2,]             0 1.001820e-02           0            1           1 2.168011e+13
     [3,]             0 9.514925e-04           0            1           1 2.168011e+13
     [4,]             0 4.885943e-05           0            1           1 2.168011e+13
     [5,]             0 1.916193e-06           0            1           1 2.168011e+13
     [6,]             0 6.717539e-08           0            1           1 2.168011e+13
     [7,]             1 2.286806e-09           2            3           0 2.103727e+13
     [8,]             1 1.780050e-09           2            3           0 1.049278e+13
     [9,]             1 1.536878e-09           2            3           0 8.392762e+12
    [10,]             1 1.429097e-09           1            1           0 7.337860e+12
    [11,]             1 1.402603e-09           2            3           0 7.143783e+12
    [12,]             1 1.432455e-09           1            1           0 6.490294e+12
    [13,]             1 1.506772e-09           1            1           0 6.341383e+12
    [14,]             1 1.620191e-09           2            3           0 6.177983e+12
    [15,]             1 1.770938e-09           6           63           0 5.509423e+12
    [16,]             1 1.959400e-09           1            1           0 2.176023e+12
    [17,]             1 2.187394e-09           1            1           0 2.144719e+12
    [18,]             1 2.457771e-09           3            7           0 2.102302e+12
    [19,]             1 2.774181e-09           2            3           0 1.508265e+12
    [20,]             1 3.140953e-09           1            1           0 1.393930e+12
    [21,]             1 3.563005e-09           1            1           0 1.361079e+12
    [22,]             1 4.045802e-09           3            7           0 1.314966e+12
    [23,]             1 4.595326e-09           4           15           0 7.621139e+11
    [24,]             1 5.218050e-09           1            1           0 3.220273e+11
    [25,]             1 5.920936e-09           2            3           0 3.170939e+11
    [26,]             1 6.711416e-09           1            1           0 2.943347e+11
    [27,]             1 7.597392e-09           1            1           0 2.877424e+11
    [28,]             1 8.587230e-09           3            7           0 2.785319e+11
    [29,]             1 9.689752e-09           3            7           0 1.665425e+11
    [30,]             1 1.091423e-08           3            7           0 1.120353e+11

And also I ran get_adaptation_info(), it is apparent that chain 3 has different behavior than the other chains.

[1] "# Adaptation terminated\n# Step size = 0.0173449\n# Diagonal elements of inverse mass matrix:\n# 0.12797, 0.207466, 0.0583861, 0.262218, 0.00274245, 0.0397579, 0.00743187, 0.0350643, 0.000489024, 0.000437928, 0.022354, 0.0277263, 0.0339336, 0.0482893, 0.230905, 0.0513686, 0.0457979, 0.798876, 0.184697, 1.32798, 0.0661912, 0.0772531, 0.195713, 0.0933529, 0.217419, 0.126162, 0.0941312, 0.223627, 0.0877216, 0.128143, 0.421813, 0.0891986, 0.0764124, 0.182436, 0.0938813, 0.0874292, 0.113718, 0.072941, 0.144466, 0.08523, 1.36977, 0.161513, 0.117778, 0.0814578, 0.0813858, 0.0887076, 0.107297, 0.0942244, 0.262298, 0.0753033, 0.293906, 0.229777, 0.117618, 0.126842, 0.20437, 0.263833, 0.276794, 0.214986, 0.153346, 0.151326, 0.272189, 0.149247, 0.0896074, 0.294099, 0.341472, 0.22304, 0.517192, 0.265631, 0.285168, 0.177408, 0.0806569, 0.0603476, 0.124002, 0.166297, 0.109847, 0.119436, 0.329544, 0.212882, 0.282552, 0.245942, 0.13614, 0.114498, 0.38946, 0.148072, 0.111022, 0.107911, 0.109772, 0... <truncated>

[[2]]
[1] "# Adaptation terminated\n# Step size = 0.0209994\n# Diagonal elements of inverse mass matrix:\n# 0.115718, 0.205785, 0.0537171, 0.244166, 0.00251715, 0.030784, 0.0075802, 0.0350458, 0.000519418, 0.000411778, 0.0228099, 0.0292809, 0.0376576, 0.0442331, 0.153228, 0.0532738, 0.0496571, 1.16481, 0.158387, 1.20273, 0.0565523, 0.0919289, 0.183975, 0.0947284, 0.204828, 0.138392, 0.0980136, 0.253032, 0.0799498, 0.171863, 0.413476, 0.103841, 0.0791864, 0.158471, 0.0881689, 0.0898936, 0.0998266, 0.0648375, 0.131384, 0.0835116, 1.0458, 0.172109, 0.113653, 0.0843148, 0.0817037, 0.0844458, 0.106355, 0.090414, 0.25457, 0.0675009, 0.242018, 0.177575, 0.0974645, 0.135264, 0.262727, 0.219498, 0.294108, 0.18109, 0.134527, 0.170718, 0.252306, 0.215155, 0.0913311, 0.327824, 0.38818, 0.268098, 0.365373, 0.296492, 0.323953, 0.168218, 0.0737963, 0.0632496, 0.1028, 0.154365, 0.102654, 0.124928, 0.321451, 0.235798, 0.274926, 0.282784, 0.135013, 0.130868, 0.29872, 0.145211, 0.11884, 0.118941, 0.111523, 0.4... <truncated>

[[3]]
[1] "# Adaptation terminated\n# Step size = 4.83033e-11\n# Diagonal elements of inverse mass matrix:\n# 9.90099e-06, 9.90101e-06, 9.90099e-06, 9.90099e-06, 0.00234474, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.91011e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, ... <truncated>

[[4]]
[1] "# Adaptation terminated\n# Step size = 0.0239955\n# Diagonal elements of inverse mass matrix:\n# 0.11274, 0.186193, 0.0549136, 0.262421, 0.00312959, 0.0360401, 0.00655043, 0.0281247, 0.000581819, 0.000432995, 0.0234177, 0.0286032, 0.036863, 0.0515373, 0.353247, 0.0465309, 0.0419241, 1.1369, 0.136503, 1.20595, 0.0570064, 0.08132, 0.208696, 0.0785333, 0.175273, 0.125332, 0.0889874, 0.315583, 0.0905617, 0.143229, 0.448736, 0.10262, 0.0840417, 0.212294, 0.0953821, 0.0848, 0.102528, 0.0618765, 0.12662, 0.0817934, 1.14671, 0.196956, 0.133612, 0.0856085, 0.0832608, 0.0957021, 0.104368, 0.0980542, 0.265415, 0.0589645, 0.288139, 0.22046, 0.132964, 0.129785, 0.318438, 0.213677, 0.299509, 0.213899, 0.12354, 0.174327, 0.279038, 0.173186, 0.0745602, 0.327956, 0.290788, 0.230392, 0.37371, 0.292577, 0.305679, 0.168716, 0.072329, 0.0526341, 0.102998, 0.152814, 0.0998614, 0.116766, 0.373235, 0.231323, 0.262656, 0.242449, 0.135874, 0.127976, 0.276872, 0.156424, 0.129711, 0.11855, 0.116443, 0.417212... <truncated>

What does this imply and is there any parameter I can change that can remedy this?

Thanks!

bgoodri · October 4, 2017, 4:37pm

Often this can be avoided by specifying init_r to be some number less than its default value of 2. However, the fact that this issue can arise suggests that this posterior distribution is difficult to draw from (and the fact that diverget__ is often 1 confirms it).

wzhong · October 4, 2017, 5:01pm

Thanks, I will try that. Do I need to specify different inits for parameters too, or start with init_r first?

bgoodri · October 4, 2017, 5:57pm

Start with init_r

wlandau · November 11, 2020, 6:17pm

Are there any potential adverse consequences of setting init_r too low? Does it affect the distributions of the starting values for the parameters? Ideally, I would like them to be dispersed enough for potential scale reduction factors to still be valid.

bgoodri · November 11, 2020, 11:31pm

You might miss a mode, but I would say in general that the default value of init_r of 2 is more likely to be too big than too small.

Topic		Replies	Views
Inconsistent chain speed - does this give a clue about the problem? Algorithms optimization	10	4604	July 20, 2018
Initialization error Modeling	18	1722	August 8, 2020
Chains getting stuck/not mixing issues Modeling performance	4	4380	November 23, 2018
Problems remain after non-centered parametrization of correlated parameters Modeling	34	4042	July 3, 2019
Divergent transitions & BFMI low in a state-space model Modeling	6	1014	August 17, 2017

One MCMC chain not moving

Related topics