NUTS variation

Naming convention point: we are long past the “NUTS” algorithm. There will be a renaming around Stan3 but for now just refer to it as “dynamic HMC”.

In any case it’s better to think about the possible parallelization speedup this way. First speculatively integrate forwards and backwards in time and then run the sampler to consume those speculative trajectories. If you fully consume one of the trajectories then you can expand the speculation and continue.

Because the expansion is multiplicative, however, you’re likely to end up wasting a bunch of that speculative computation on both sides, yielding a much smaller speedup than 1.5 on average. Moreover you’ll have to keep all of those speculative states in memory which will be a significant burden for higher-dimensional problems. One of the big advantages of multiplicative expansion is needed only a logarithm number of states in memory at any given time.

Parallelizeable resources are much better spent on speeding the gradient evaluation or running multiple chains in memory to pool adaptation information.