extra_checks_report.pdf (488.7 KB)
This looks pretty conclusive. In addition to looking at the effective sample sizes for x^{2}'s in addition to x's I turned off metric adaptation (for each of these models the optimal scales are the default ones) to stabilize the runs (like the step size the metric adaptation can be a bit noisy and obscure the differences between the variants).
The additional checks are uniformly better than or equal to the nominal results for the x^{2}'s in effective samples per gradient calculation, even for the correlated and heavy tailed targets, and in the few instances where the nominal is better for the x's the difference is small.
I’ll clean up the code and work through updating all of the tests before submitting a PR in the next few days. Then we can update the step size adaptation criterion to the new acceptance statistic and we’ll have a shiny new sampler variant for 2.21.