Great, thanks, @bbbales2! This is pretty much consistent with my expectations. Note that to avoid the ESS/time variability it’s often easier to compare ESS/n_leapfrog – because the autodiff is fixed the only algorithmic differences will be in the number of leapfrog/gradient calls and not the time per each of those calls.

Additionally I avoid the older examples like the BUGS models and the radon model as the priors and parameterizations are terrible. Similarly I don’t consider anything model that yields divergences in performance comparisons – once it’s divergent the performance is irrelevant.

When building up empirical comparisons I instead focus on models that capture single qualitative features of interest – high dimension, high linear correlations, heavy tails, etc – as they make it easier to identify which features, if any, might cause problems.

For this tweak my comparison were

```
10 Dimensional IID Normal
Nominal:
Mean MCSE StdDev 5% 50% 95%
accept_stat__ 8.9e-01 1.5e-03 1.1e-01 0.67 9.2e-01 1.0
stepsize__ 6.8e-01 1.7e-14 1.2e-14 0.68 6.8e-01 0.68
treedepth__ 2.8e+00 5.6e-03 3.7e-01 2.0 3.0e+00 3.0
n_leapfrog__ 6.6e+00 1.8e-02 1.2e+00 3.0 7.0e+00 7.0
divergent__ 0.0e+00 nan 0.0e+00 0.00 0.0e+00 0.00
energy__ 1.0e+01 7.2e-02 3.2e+00 5.4 9.7e+00 16
99700 total effective samples
2.848 effective samples per gradient evaluation
Adapt Tweak:
Mean MCSE StdDev 5% 50% 95%
accept_stat__ 8.7e-01 1.6e-03 1.3e-01 0.61 9.1e-01 1.0
stepsize__ 8.2e-01 5.7e-15 4.0e-15 0.82 8.2e-01 0.82
treedepth__ 3.9e+00 2.6e-02 1.7e+00 2.0 4.0e+00 6.0
n_leapfrog__ 3.5e+01 6.4e-01 4.0e+01 3.0 2.3e+01 127
divergent__ 0.0e+00 nan 0.0e+00 0.00 0.0e+00 0.00
energy__ 1.0e+01 6.8e-02 3.2e+00 5.5 9.7e+00 16
51700 total effective samples (48% reduction)
1.477 effective samples per gradient evaluation (48% reduction)
100 Dimensional IID Normal
Nominal:
Mean MCSE StdDev 5% 50% 95%
accept_stat__ 8.3e-01 2.0e-03 1.5e-01 0.56 8.6e-01 1.0
stepsize__ 4.8e-01 1.3e-14 9.4e-15 0.48 4.8e-01 0.48
treedepth__ 3.0e+00 nan 5.1e-14 3.0 3.0e+00 3.0
n_leapfrog__ 7.0e+00 nan 9.7e-14 7.0 7.0e+00 7.0
divergent__ 0.0e+00 nan 0.0e+00 0.00 0.0e+00 0.00
energy__ 1.0e+02 2.5e-01 1.0e+01 83 1.0e+02 117
697300 total effective samples
19.922 effective samples per gradient evaluation
Adapt Tweak:
Mean MCSE StdDev 5% 50% 95%
accept_stat__ 8.4e-01 1.8e-03 1.3e-01 0.60 8.7e-01 1.0
stepsize__ 5.4e-01 5.3e-15 3.8e-15 0.54 5.4e-01 0.54
treedepth__ 3.0e+00 nan 5.1e-14 3.0 3.0e+00 3.0
n_leapfrog__ 7.0e+00 nan 9.7e-14 7.0 7.0e+00 7.0
divergent__ 0.0e+00 nan 0.0e+00 0.00 0.0e+00 0.00
energy__ 1.0e+02 2.4e-01 9.9e+00 84 1.0e+02 117
932700 total effective samples (33% improvement)
26.648 effective samples per gradient evaluation (34% improvement)
100 Dimensional IID Student t (5 degrees of freedom)
Nominal:
Mean MCSE StdDev 5% 50% 95%
accept_stat__ 8.8e-01 1.5e-03 1.1e-01 0.67 9.1e-01 1.00
stepsize__ 3.3e-01 9.4e-16 6.7e-16 0.33 3.3e-01 0.33
treedepth__ 4.0e+00 2.0e-04 1.4e-02 4.0 4.0e+00 4.0
n_leapfrog__ 1.5e+01 1.7e-02 1.1e+00 15 1.5e+01 15
divergent__ 0.0e+00 nan 0.0e+00 0.00 0.0e+00 0.00
energy__ 1.2e+02 3.2e-01 1.2e+01 98 1.2e+02 136
812700 total effective samples
23.220 effective samples per gradient evaluation
Adapt Tweak:
Mean MCSE StdDev 5% 50% 95%
accept_stat__ 8.8e-01 1.7e-03 1.2e-01 0.65 9.2e-01 1.00
stepsize__ 3.6e-01 9.6e-15 6.8e-15 0.36 3.6e-01 0.36
treedepth__ 4.1e+00 6.0e-03 3.8e-01 4.0 4.0e+00 5.0
n_leapfrog__ 2.0e+01 2.5e-01 1.5e+01 15 1.5e+01 47
divergent__ 0.0e+00 nan 0.0e+00 0.00 0.0e+00 0.00
energy__ 1.2e+02 3.2e-01 1.2e+01 97 1.2e+02 135
815300 total effective samples (0.3% improvement)
23.294 effective samples per gradient evaluation (0.3% improvement)
50 Dimensional Multi Normal, rho=0.25
Nominal:
Mean MCSE StdDev 5% 50% 95%
accept_stat__ 8.4e-01 1.9e-03 1.4e-01 0.57 8.7e-01 1.0
stepsize__ 5.1e-01 2.8e-15 2.0e-15 0.51 5.1e-01 0.51
treedepth__ 3.0e+00 2.0e-04 1.4e-02 3.0 3.0e+00 3.0
n_leapfrog__ 7.0e+00 3.6e-03 2.5e-01 7.0 7.0e+00 7.0
divergent__ 0.0e+00 nan 0.0e+00 0.00 0.0e+00 0.00
energy__ 5.0e+01 1.7e-01 7.2e+00 39 5.0e+01 62
364900 total effective samples
10.425 effective samples per gradient evaluation
Adapt Tweak:
Mean MCSE StdDev 5% 50% 95%
accept_stat__ 8.5e-01 1.8e-03 1.3e-01 0.60 8.9e-01 1.0
stepsize__ 5.5e-01 6.9e-15 4.9e-15 0.55 5.5e-01 0.55
treedepth__ 3.0e+00 nan 5.1e-14 3.0 3.0e+00 3.0
n_leapfrog__ 7.0e+00 nan 9.7e-14 7.0 7.0e+00 7.0
divergent__ 0.0e+00 nan 0.0e+00 0.00 0.0e+00 0.00
energy__ 5.0e+01 1.5e-01 7.1e+00 39 5.0e+01 62
437100 total effective samples (20% improvement)
12.488 effective samples per gradient evaluation (20% improvement)
10 Dimensional Multi Normal, rho=0.80
Nominal:
Mean MCSE StdDev 5% 50% 95%
accept_stat__ 9.0e-01 1.2e-03 9.8e-02 0.70 9.3e-01 1.0
stepsize__ 3.1e-01 8.6e-15 6.1e-15 0.31 3.1e-01 0.31
treedepth__ 3.5e+00 1.0e-02 6.9e-01 3.0 4.0e+00 5.0
n_leapfrog__ 1.7e+01 1.4e-01 9.7e+00 7.0 1.5e+01 31
divergent__ 0.0e+00 nan 0.0e+00 0.00 0.0e+00 0.00
energy__ 9.9e+00 7.3e-02 3.1e+00 5.4 9.6e+00 15
20100 total effective samples
0.574 effective samples per gradient evaluation
Adapt Tweak:
Mean MCSE StdDev 5% 50% 95%
accept_stat__ 9.1e-01 1.1e-03 9.0e-02 0.73 9.4e-01 1.0
stepsize__ 2.9e-01 7.9e-17 5.6e-17 0.29 2.9e-01 0.29
treedepth__ 3.5e+00 1.1e-02 7.0e-01 3.0 3.0e+00 5.0
n_leapfrog__ 1.7e+01 1.5e-01 9.9e+00 7.0 1.5e+01 31
divergent__ 0.0e+00 nan 0.0e+00 0.00 0.0e+00 0.00
energy__ 9.9e+00 7.4e-02 3.1e+00 5.3 9.6e+00 15
19200 total effective samples (4% reduction)
0.548 effective samples per gradient evaluation (4% reduction)
50 Dimensional Multi Normal, rho=0.80
Nominal:
Mean MCSE StdDev 5% 50% 95%
accept_stat__ 8.6e-01 1.6e-03 1.2e-01 0.63 8.9e-01 1.00
stepsize__ 2.3e-01 5.4e-15 3.8e-15 0.23 2.3e-01 0.23
treedepth__ 4.7e+00 1.3e-02 8.0e-01 3.0 5.0e+00 6.0
n_leapfrog__ 4.0e+01 3.7e-01 2.4e+01 15 3.1e+01 79
divergent__ 0.0e+00 nan 0.0e+00 0.00 0.0e+00 0.00
energy__ 5.0e+01 1.7e-01 7.0e+00 39 5.0e+01 62
182400 total effective samples
5.211 effective samples per gradient evaluation
Adapt Tweak:
Mean MCSE StdDev 5% 50% 95%
accept_stat__ 8.4e-01 1.9e-03 1.4e-01 0.56 8.7e-01 1.00
stepsize__ 2.8e-01 3.1e-16 2.2e-16 0.28 2.8e-01 0.28
treedepth__ 4.7e+00 1.1e-02 7.0e-01 4.0 5.0e+00 6.0
n_leapfrog__ 3.7e+01 3.9e-01 2.5e+01 15 3.1e+01 79
divergent__ 0.0e+00 nan 0.0e+00 0.00 0.0e+00 0.00
energy__ 5.0e+01 1.6e-01 6.9e+00 39 5.0e+01 62
229000 total effective samples (26% improvement)
6.542 effective samples per gradient evaluation (26% improvement)
```

Hopefully the pattern is clear and consistent with what you see. In high-dimensions the tweak improves performance unless the tails are really heavy, in which case the performance is the same. In low dimensions the tweak can reduce performance because the tree depth distribution expands and the sampler wastes time on overly long trajectories. This is due to the higher step size leading to the generalized No-U-Turn criterion to be evaluated at too crude a resolution, with the numerical trajectories continuing much too far before finally being terminated. At the same time this performance dip is in low dimensional models which already run extremely quickly.