With the current Stan implementation it is quite unlikely to see much difference in warmup speed except for very challenging posteriors or if the default initialization would start in the region, where e.g. ODE solver would struggle.
Stan warmup uses a fixed number of iterations and the default number of iterations has been chosen to be safe and conservative. It would be possible to use adaptive warmup length as demonstrated in [1905.11916] Selecting the Metric in Hamiltonian Monte Carlo and New adaptive warmup proposal (looking for feedback)!, and with these you would more often see faster warmup given Pathfinder initialization. Unfortunately, these adaptive warmup methods have not yet been implemented in Stan.
In addition that Pathfinder is useful for quickly testing your model code as you can check that quick result looks somewhat reasoanble, you may get better mass matrix and step size adaptation with very short warmup. For example, in case of one nasty posterior iter_warmup=100, iter_sampling=100
takes already 5mins per chain, but we can get much better ESSs with Pathfinder inits (running Pathfinder takes 5s). Look at the rhat’s and ess’s. First the posterior summary with default inits:
> print(f1, digits=1)
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
lp__ -7509.0 -7426.0 202.0 53.0 -7945.8 -7350.8 1.4 9 18
b[1] -1.2 -1.2 0.1 0.1 -1.3 -1.0 1.0 544 258
b[2] -2.4 -2.4 0.1 0.2 -2.6 -2.1 1.1 52 295
b[3] -0.8 -0.8 0.1 0.1 -1.0 -0.7 1.0 452 374
b[4] -1.3 -1.3 0.1 0.1 -1.4 -1.2 1.0 464 348
Intercept[1] 0.2 0.2 0.1 0.1 0.1 0.4 1.1 32 115
Intercept[2] 2.4 2.4 0.1 0.1 2.2 2.6 1.2 18 38
sd_1[1] 1.6 1.9 0.8 0.2 0.0 2.2 1.4 10 22
sd_1[2] 1.3 1.3 0.1 0.1 1.1 1.5 1.2 17 48
sd_1[3] 1.9 1.9 0.2 0.1 1.6 2.2 1.2 14 52
# showing 10 of 4816 rows (change via 'max_rows' argument or 'cmdstanr_max_rows' option)
and then the posterior summary with Pathfinder inits:
> print(f1i, digits=1)
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
lp__ -7404.4 -7409.1 43.0 42.7 -7468.8 -7335.9 1.0 94 111
b[1] -1.2 -1.2 0.1 0.1 -1.3 -1.0 1.0 397 341
b[2] -2.4 -2.4 0.1 0.1 -2.7 -2.2 1.0 424 352
b[3] -0.8 -0.8 0.1 0.1 -1.0 -0.7 1.0 388 255
b[4] -1.3 -1.3 0.1 0.1 -1.4 -1.1 1.0 506 400
Intercept[1] 0.2 0.2 0.1 0.1 0.1 0.4 1.0 480 297
Intercept[2] 2.5 2.5 0.1 0.1 2.3 2.6 1.0 421 305
sd_1[1] 2.0 2.0 0.1 0.1 1.8 2.2 1.0 216 397
sd_1[2] 1.3 1.2 0.1 0.1 1.1 1.4 1.0 190 270
sd_1[3] 2.0 2.0 0.1 0.1 1.7 2.2 1.1 95 178
# showing 10 of 4816 rows (change via 'max_rows' argument or 'cmdstanr_max_rows' option)
But if in this same case the default 1000 warmup iterations is used, there is no visible benefit from Pathfinder inits as the default, which is good in that sense that the default 1000 warmup iteration approach is not sensitive to the initial values.