Issue with dual averaging

Agreed. Even if this worked out to be better in all cases, I imagine it wouldn’d make it in as a default given how core this component is and since Stan has had this behavior from the very beginning.

Even though it seems to enable \alpha to converge to \delta more quickly, there may be cases where people are relying on the current behavior. For example, if someone sets \delta to 0.9, and doesn’t run the warmup until it has fully converged, they may end up with a \alpha of 0.95 since the step size kept dropping during warmup. Improving the convergence could end up with larger stepsizes and \alpha closer to the \delta they set. However, the larger step sizes may not be appropriate for the model and they could end up with divergent transitions. Obviously, this could be fixed by setting \delta to the appropriate value.

I’m going to do more testing, but if these results hold it would mean that I can shave 2 days off of a model that takes 7 days to run. I’ve spent a lot of effort trying to get this code to run faster, and this is by far the biggest improvement.

Before, I had to increase init_buffer from 75 to about 200 because I kept getting chains that would have step sizes that were way too small and would be far outside the typical set after the default number of iterations. Even with the large init_buffer, sometimes chains would still not have converged, but would usually be okay after a few adaptation windows that bumped up the stepsize by 10x. With this change, I’ve been able to go back to an init_buffer of 75, and shorten the rest of the warmup with apparently no adverse impact.

@andre.pfeuffer
I pushed the code to my git repo here:

I’m not sure how to run the jenkins tests though…

1 Like