Checking if sampling uses initial values?

With the current Stan implementation it is quite unlikely to see much difference in warmup speed except for very challenging posteriors or if the default initialization would start in the region, where e.g. ODE solver would struggle.

Stan warmup uses a fixed number of iterations and the default number of iterations has been chosen to be safe and conservative. It would be possible to use adaptive warmup length as demonstrated in [1905.11916] Selecting the Metric in Hamiltonian Monte Carlo and New adaptive warmup proposal (looking for feedback)!, and with these you would more often see faster warmup given Pathfinder initialization. Unfortunately, these adaptive warmup methods have not yet been implemented in Stan.

In addition that Pathfinder is useful for quickly testing your model code as you can check that quick result looks somewhat reasoanble, you may get better mass matrix and step size adaptation with very short warmup. For example, in case of one nasty posterior iter_warmup=100, iter_sampling=100 takes already 5mins per chain, but we can get much better ESSs with Pathfinder inits (running Pathfinder takes 5s). Look at the rhat’s and ess’s. First the posterior summary with default inits:

> print(f1, digits=1)
     variable    mean  median    sd  mad      q5     q95 rhat ess_bulk ess_tail
 lp__         -7509.0 -7426.0 202.0 53.0 -7945.8 -7350.8  1.4        9       18
 b[1]            -1.2    -1.2   0.1  0.1    -1.3    -1.0  1.0      544      258
 b[2]            -2.4    -2.4   0.1  0.2    -2.6    -2.1  1.1       52      295
 b[3]            -0.8    -0.8   0.1  0.1    -1.0    -0.7  1.0      452      374
 b[4]            -1.3    -1.3   0.1  0.1    -1.4    -1.2  1.0      464      348
 Intercept[1]     0.2     0.2   0.1  0.1     0.1     0.4  1.1       32      115
 Intercept[2]     2.4     2.4   0.1  0.1     2.2     2.6  1.2       18       38
 sd_1[1]          1.6     1.9   0.8  0.2     0.0     2.2  1.4       10       22
 sd_1[2]          1.3     1.3   0.1  0.1     1.1     1.5  1.2       17       48
 sd_1[3]          1.9     1.9   0.2  0.1     1.6     2.2  1.2       14       52

 # showing 10 of 4816 rows (change via 'max_rows' argument or 'cmdstanr_max_rows' option)

and then the posterior summary with Pathfinder inits:

> print(f1i, digits=1)
     variable    mean  median   sd  mad      q5     q95 rhat ess_bulk ess_tail
 lp__         -7404.4 -7409.1 43.0 42.7 -7468.8 -7335.9  1.0       94      111
 b[1]            -1.2    -1.2  0.1  0.1    -1.3    -1.0  1.0      397      341
 b[2]            -2.4    -2.4  0.1  0.1    -2.7    -2.2  1.0      424      352
 b[3]            -0.8    -0.8  0.1  0.1    -1.0    -0.7  1.0      388      255
 b[4]            -1.3    -1.3  0.1  0.1    -1.4    -1.1  1.0      506      400
 Intercept[1]     0.2     0.2  0.1  0.1     0.1     0.4  1.0      480      297
 Intercept[2]     2.5     2.5  0.1  0.1     2.3     2.6  1.0      421      305
 sd_1[1]          2.0     2.0  0.1  0.1     1.8     2.2  1.0      216      397
 sd_1[2]          1.3     1.2  0.1  0.1     1.1     1.4  1.0      190      270
 sd_1[3]          2.0     2.0  0.1  0.1     1.7     2.2  1.1       95      178

 # showing 10 of 4816 rows (change via 'max_rows' argument or 'cmdstanr_max_rows' option)

But if in this same case the default 1000 warmup iterations is used, there is no visible benefit from Pathfinder inits as the default, which is good in that sense that the default 1000 warmup iteration approach is not sensitive to the initial values.

2 Likes