After comparing many combinations of the tuning parameters (warmup
, iter
, window
, term_buffer
, max_treedepth
, metric=dense_e
vs. diag_e
, adapt_delta
), the following combinations seem to give the shortest run time while keeping the sampling results at an acceptable level:
diag5.4K[35,150,0.82,600]_5.8 1.4hrs.txt (4.3 KB)
dense5.4K[35,150,0.982,600]_6.7 1.4hrs.txt (4.3 KB)
diag9K[35,150,0.82,600]_12 4.1hrs.txt (4.3 KB)
dense9K[25,50,0.982,600]_13.7 3.7hrs.txt (4.3 KB)
The takeways seem to be
- For a given amount of simulated data points (9K or 5.4K),
diag_e
runs a bit faster thandense_e
mainly because the former can take a lower level ofadapt_delta
(0.82, instead of 0.982) without leading to divergences.- But as far as the parameters of interest are concerned, the
N_Eff
s fromdiag_e
seem to be more diverse, with the lowest being acceptable but substantially lower than the lowest of those fromdense_e
.
- But as far as the parameters of interest are concerned, the
- The default choices of
window
,init_buffer
, andterm_buffer
(=25, 75, 50) appear to be quite good: alternative choices do not lead to dramatic improvements, especially when the sample size is smaller (ie, 5.4K, rather than 90K).- (
window
,term_buffer
) = (35, 150) performs slightly better. Not sure of the reason but my guess is that a largerwindow
(35, rather than 25) reduces the iterations spent on the last and slowiest stage of the adaptation process, where further search for improvement has little incremental gains. On the other hand, a largerterm_buffer
(150, rather than 50) allocates more iternations to optimally adjust the initial typical set of estimates according to the findings from the adaptation process. - Despite the rationalization given above, increasing (
window
,term_buffer
) beyond (35, 150) would not necessarily improve further owing to the complicated adaptation process. There is simply no obvious relation between the run time and these tuning parameters. Apparently, they interact in a complicated way with thewarmup
anditer
to determine the run time.
- (
- During the course of my comparison, I have the impression that choosing an unnecessarily high level of
adapt_delta
often results in hitting themax_treedepth
(many such instances can lead to a rather long run time). So to avoid divergences, it’s about finding the suitable level ofadapt_delta
rather than the higher the better.