One MCMC chain not moving

Hi,

I am fitting a complicated hierarchical reinforcement learning model, and I am having some problems with model diagnostics. Specifically, when I examine the trace plots for the group hyperparameters, one of the MCMC chains seems to be stationary for most variables (except for mu_par[5]) while the other chains mix reasonably well (see below).


And here are the pairs plot, unfortunately the matrix is too large to fit both mu and sigma’s in the same plot…


I placed the following priors:

mu_par[1:8] ~ normal(0,1);  
sigma[1:8] ~ normal(0,1);
	
mu_par[9:10] ~ normal(0,2);  
sigma[9:10] ~  cauchy(0,2);

And I ran the model with step size = 0.001 and adapt delta = 0.999. I also noticed that when I ran the model, one chain took less than 10 minutes to complete while the three other chains took between half-day to one day. I am wondering if it is the fast chain that is having some problem?
I would be grateful if anyone can suggest what steps I should take to resolve this problem and what may be the cause of this issue. Thanks!

Quite likely. You can look at the list generated by get_sampler_params(). Presumably one chain adapted very differently than the others.

I ran the get_sampler_params() on the fit object and here are the outputs. It seems the pathological chain 3 has very small step size and very high energy.

  [[1]]
          accept_stat__   stepsize__ treedepth__ n_leapfrog__ divergent__     energy__
     [1,]             1 7.450581e-12           2            3           0 1.306063e+19
     [2,]             0 1.001820e-02           0            1           1 1.499753e+18
     [3,]             0 9.514925e-04           0            1           1 1.499753e+18
     [4,]             0 4.885943e-05           0            1           1 1.499753e+18
     [5,]             0 1.916193e-06           0            1           1 1.499753e+18
     [6,]             0 6.717539e-08           0            1           1 1.499753e+18
     [7,]             0 2.286806e-09           0            1           1 1.499753e+18
     [8,]             0 7.918191e-11           0            1           1 1.499753e+18
     [9,]             1 2.863701e-12           4           15           0 1.492220e+18
    [10,]             1 2.583148e-12           7          127           0 5.344301e+17
    [11,]             1 2.513133e-12           1            1           0 2.224569e+17
    [12,]             1 2.585101e-12           2            3           0 2.215456e+17
    [13,]             1 2.771415e-12           3            7           0 2.177481e+17
    [14,]             1 3.064166e-12           1            1           0 2.016586e+17
    [15,]             1 3.466581e-12           1            1           0 2.005466e+17
    [16,]             1 3.989317e-12           1            1           0 1.991385e+17
    [17,]             1 4.648851e-12           1            1           0 1.972991e+17
    [18,]             1 5.466865e-12           1            1           0 1.948461e+17
    [19,]             1 6.470127e-12           2            3           0 1.914051e+17
    [20,]             1 7.690664e-12           1            1           0 1.745322e+17
    [21,]             1 9.166097e-12           2            3           0 1.689292e+17
    [22,]             1 1.094009e-11           4           15           0 1.426225e+17
    [23,]             1 1.306289e-11           2            3           0 4.811179e+16
    [24,]             1 1.559188e-11           1            1           0 4.346739e+16
    [25,]             1 1.859224e-11           1            1           0 4.201955e+16
    [26,]             1 2.213764e-11           1            1           0 4.009627e+16
    [27,]             1 2.631089e-11           1            1           0 3.761618e+16
    [28,]             1 3.120472e-11           1            1           0 3.453972e+16
    [29,]             1 3.692251e-11           2            3           0 3.057181e+16
    [30,]             1 4.357908e-11           1            1           0 1.845041e+16

    [[2]]
          accept_stat__   stepsize__ treedepth__ n_leapfrog__ divergent__  energy__
     [1,]     0.0000000 0.0080000000           0            1           1 220443.72
     [2,]     1.0000000 0.0016261601           3            7           0 214171.69
     [3,]     1.0000000 0.0009514925           3            7           0 116666.99
     [4,]     1.0000000 0.0007017947           5           31           0  78637.18
     [5,]     1.0000000 0.0005809276           4           15           0  33377.93
     [6,]     1.0000000 0.0005148232           4           15           0  31327.58
     [7,]     1.0000000 0.0004766819           4           15           0  27454.79
     [8,]     1.0000000 0.0004546284           4           15           0  25573.87
     [9,]     1.0000000 0.0004426525           5           31           0  25263.59
    [10,]     1.0000000 0.0004374081           5           31           0  24022.68
    [11,]     0.9999997 0.0004368918           5           31           0  23087.26
    [12,]     1.0000000 0.0004398329           5           31           0  20927.98
    [13,]     1.0000000 0.0004453883           6           63           0  19680.25
    [14,]     0.9998862 0.0004529745           6           63           0  18248.85
    [15,]     0.9999979 0.0004620113           7          127           0  16891.33
    [16,]     0.9992086 0.0004725150           8          255           0  14261.29
    [17,]     0.9914343 0.0004829221          11         2047           0  12756.67
    [18,]     0.9992218 0.0004825939          12         4095           0  11649.08
    [19,]     0.9862694 0.0004944094          13         8191           0  11048.24
    [20,]     0.9954384 0.0004874687          13         8191           0  10491.81
    [21,]     0.9990575 0.0004944489          13         8191           0  10249.38
    [22,]     0.9948989 0.0005071214          13         8191           0  10220.08
    [23,]     0.9974357 0.0005138497          13         8191           0  10171.28
    [24,]     0.9962128 0.0005246038          13         8191           0  10163.10
    [25,]     0.9955914 0.0005336683          13         8191           0  10138.83
    [26,]     0.9997636 0.0005418895          13         8191           0  10128.69
    [27,]     0.9999868 0.0005567270          13         8191           0  10086.87
    [28,]     0.9997677 0.0005721352          13         8191           0  10104.65
    [29,]     0.9973125 0.0005873873          13         8191           0  10112.70
    [30,]     0.9972249 0.0005987351          13         8191           0  10106.31

    [[3]]
          accept_stat__   stepsize__ treedepth__ n_leapfrog__ divergent__     energy__
     [1,]             1 1.292470e-29           1            1           0 5.797633e+54
     [2,]             0 1.001820e-02           0            1           1 2.297321e+54
     [3,]             0 9.514925e-04           0            1           1 2.297321e+54
     [4,]             0 4.885943e-05           0            1           1 2.297321e+54
     [5,]             0 1.916193e-06           0            1           1 2.297321e+54
     [6,]             0 6.717539e-08           0            1           1 2.297321e+54
     [7,]             0 2.286806e-09           0            1           1 2.297321e+54
     [8,]             0 7.918191e-11           0            1           1 2.297321e+54
     [9,]             0 2.863701e-12           0            1           1 2.297321e+54
    [10,]             0 1.098229e-13           0            1           1 2.297321e+54
    [11,]             0 4.502938e-15           0            1           1 2.297321e+54
    [12,]             0 1.981857e-16           0            1           1 2.297321e+54
    [13,]             0 9.375834e-18           0            1           1 2.297321e+54
    [14,]             0 4.766257e-19           0            1           1 2.297321e+54
    [15,]             0 2.600133e-20           0            1           1 2.297321e+54
    [16,]             0 1.519193e-21           0            1           1 2.297321e+54
    [17,]             0 9.484620e-23           0            1           1 2.297321e+54
    [18,]             0 6.311388e-24           0            1           1 2.297321e+54
    [19,]             0 4.464805e-25           0            1           1 2.297321e+54
    [20,]             0 3.349092e-26           0            1           1 2.297321e+54
    [21,]             0 2.657007e-27           0            1           1 2.297321e+54
    [22,]             0 2.223961e-28           0            1           1 2.297321e+54
    [23,]             1 1.959293e-29           1            1           0 1.973148e+54
    [24,]             1 3.316111e-29           2            3           0 8.024776e+53
    [25,]             1 5.596571e-29           1            1           0 2.762145e+53
    [26,]             1 9.404855e-29           1            1           0 9.471475e+52
    [27,]             1 1.571930e-28           1            1           0 3.281367e+52
    [28,]             1 2.610904e-28           2            3           0 1.150290e+52
    [29,]             1 4.306664e-28           1            1           0 4.086772e+51
    [30,]             1 7.051335e-28           7          127           0 1.473304e+51

    [[4]]
          accept_stat__   stepsize__ treedepth__ n_leapfrog__ divergent__     energy__
     [1,]             1 1.907349e-09           1            1           0 4.367587e+14
     [2,]             0 1.001820e-02           0            1           1 2.168011e+13
     [3,]             0 9.514925e-04           0            1           1 2.168011e+13
     [4,]             0 4.885943e-05           0            1           1 2.168011e+13
     [5,]             0 1.916193e-06           0            1           1 2.168011e+13
     [6,]             0 6.717539e-08           0            1           1 2.168011e+13
     [7,]             1 2.286806e-09           2            3           0 2.103727e+13
     [8,]             1 1.780050e-09           2            3           0 1.049278e+13
     [9,]             1 1.536878e-09           2            3           0 8.392762e+12
    [10,]             1 1.429097e-09           1            1           0 7.337860e+12
    [11,]             1 1.402603e-09           2            3           0 7.143783e+12
    [12,]             1 1.432455e-09           1            1           0 6.490294e+12
    [13,]             1 1.506772e-09           1            1           0 6.341383e+12
    [14,]             1 1.620191e-09           2            3           0 6.177983e+12
    [15,]             1 1.770938e-09           6           63           0 5.509423e+12
    [16,]             1 1.959400e-09           1            1           0 2.176023e+12
    [17,]             1 2.187394e-09           1            1           0 2.144719e+12
    [18,]             1 2.457771e-09           3            7           0 2.102302e+12
    [19,]             1 2.774181e-09           2            3           0 1.508265e+12
    [20,]             1 3.140953e-09           1            1           0 1.393930e+12
    [21,]             1 3.563005e-09           1            1           0 1.361079e+12
    [22,]             1 4.045802e-09           3            7           0 1.314966e+12
    [23,]             1 4.595326e-09           4           15           0 7.621139e+11
    [24,]             1 5.218050e-09           1            1           0 3.220273e+11
    [25,]             1 5.920936e-09           2            3           0 3.170939e+11
    [26,]             1 6.711416e-09           1            1           0 2.943347e+11
    [27,]             1 7.597392e-09           1            1           0 2.877424e+11
    [28,]             1 8.587230e-09           3            7           0 2.785319e+11
    [29,]             1 9.689752e-09           3            7           0 1.665425e+11
    [30,]             1 1.091423e-08           3            7           0 1.120353e+11

And also I ran get_adaptation_info(), it is apparent that chain 3 has different behavior than the other chains.

[1] "# Adaptation terminated\n# Step size = 0.0173449\n# Diagonal elements of inverse mass matrix:\n# 0.12797, 0.207466, 0.0583861, 0.262218, 0.00274245, 0.0397579, 0.00743187, 0.0350643, 0.000489024, 0.000437928, 0.022354, 0.0277263, 0.0339336, 0.0482893, 0.230905, 0.0513686, 0.0457979, 0.798876, 0.184697, 1.32798, 0.0661912, 0.0772531, 0.195713, 0.0933529, 0.217419, 0.126162, 0.0941312, 0.223627, 0.0877216, 0.128143, 0.421813, 0.0891986, 0.0764124, 0.182436, 0.0938813, 0.0874292, 0.113718, 0.072941, 0.144466, 0.08523, 1.36977, 0.161513, 0.117778, 0.0814578, 0.0813858, 0.0887076, 0.107297, 0.0942244, 0.262298, 0.0753033, 0.293906, 0.229777, 0.117618, 0.126842, 0.20437, 0.263833, 0.276794, 0.214986, 0.153346, 0.151326, 0.272189, 0.149247, 0.0896074, 0.294099, 0.341472, 0.22304, 0.517192, 0.265631, 0.285168, 0.177408, 0.0806569, 0.0603476, 0.124002, 0.166297, 0.109847, 0.119436, 0.329544, 0.212882, 0.282552, 0.245942, 0.13614, 0.114498, 0.38946, 0.148072, 0.111022, 0.107911, 0.109772, 0... <truncated>

[[2]]
[1] "# Adaptation terminated\n# Step size = 0.0209994\n# Diagonal elements of inverse mass matrix:\n# 0.115718, 0.205785, 0.0537171, 0.244166, 0.00251715, 0.030784, 0.0075802, 0.0350458, 0.000519418, 0.000411778, 0.0228099, 0.0292809, 0.0376576, 0.0442331, 0.153228, 0.0532738, 0.0496571, 1.16481, 0.158387, 1.20273, 0.0565523, 0.0919289, 0.183975, 0.0947284, 0.204828, 0.138392, 0.0980136, 0.253032, 0.0799498, 0.171863, 0.413476, 0.103841, 0.0791864, 0.158471, 0.0881689, 0.0898936, 0.0998266, 0.0648375, 0.131384, 0.0835116, 1.0458, 0.172109, 0.113653, 0.0843148, 0.0817037, 0.0844458, 0.106355, 0.090414, 0.25457, 0.0675009, 0.242018, 0.177575, 0.0974645, 0.135264, 0.262727, 0.219498, 0.294108, 0.18109, 0.134527, 0.170718, 0.252306, 0.215155, 0.0913311, 0.327824, 0.38818, 0.268098, 0.365373, 0.296492, 0.323953, 0.168218, 0.0737963, 0.0632496, 0.1028, 0.154365, 0.102654, 0.124928, 0.321451, 0.235798, 0.274926, 0.282784, 0.135013, 0.130868, 0.29872, 0.145211, 0.11884, 0.118941, 0.111523, 0.4... <truncated>

[[3]]
[1] "# Adaptation terminated\n# Step size = 4.83033e-11\n# Diagonal elements of inverse mass matrix:\n# 9.90099e-06, 9.90101e-06, 9.90099e-06, 9.90099e-06, 0.00234474, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.91011e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, 9.90099e-06, ... <truncated>

[[4]]
[1] "# Adaptation terminated\n# Step size = 0.0239955\n# Diagonal elements of inverse mass matrix:\n# 0.11274, 0.186193, 0.0549136, 0.262421, 0.00312959, 0.0360401, 0.00655043, 0.0281247, 0.000581819, 0.000432995, 0.0234177, 0.0286032, 0.036863, 0.0515373, 0.353247, 0.0465309, 0.0419241, 1.1369, 0.136503, 1.20595, 0.0570064, 0.08132, 0.208696, 0.0785333, 0.175273, 0.125332, 0.0889874, 0.315583, 0.0905617, 0.143229, 0.448736, 0.10262, 0.0840417, 0.212294, 0.0953821, 0.0848, 0.102528, 0.0618765, 0.12662, 0.0817934, 1.14671, 0.196956, 0.133612, 0.0856085, 0.0832608, 0.0957021, 0.104368, 0.0980542, 0.265415, 0.0589645, 0.288139, 0.22046, 0.132964, 0.129785, 0.318438, 0.213677, 0.299509, 0.213899, 0.12354, 0.174327, 0.279038, 0.173186, 0.0745602, 0.327956, 0.290788, 0.230392, 0.37371, 0.292577, 0.305679, 0.168716, 0.072329, 0.0526341, 0.102998, 0.152814, 0.0998614, 0.116766, 0.373235, 0.231323, 0.262656, 0.242449, 0.135874, 0.127976, 0.276872, 0.156424, 0.129711, 0.11855, 0.116443, 0.417212... <truncated>

What does this imply and is there any parameter I can change that can remedy this?

Thanks!

Often this can be avoided by specifying init_r to be some number less than its default value of 2. However, the fact that this issue can arise suggests that this posterior distribution is difficult to draw from (and the fact that diverget__ is often 1 confirms it).

Thanks, I will try that. Do I need to specify different inits for parameters too, or start with init_r first?

Start with init_r

Are there any potential adverse consequences of setting init_r too low? Does it affect the distributions of the starting values for the parameters? Ideally, I would like them to be dispersed enough for potential scale reduction factors to still be valid.

You might miss a mode, but I would say in general that the default value of init_r of 2 is more likely to be too big than too small.

1 Like