Issue with dual averaging

To add on to this old conversation, one issue I noticed with dual averaging is that the final average (approximate) acceptance rate is almost always above the target adapt_delta.

When I look at the adaptation parameters (via get_sampler_params) it looks to me like the final window size is too small. Per the above discussion the algorithm starts high then drops initially to counter that, eventually coming back up to the target. But the final window (adapt_term_buffer) is only 50 by default and it looks like that is not long enough to stabilize. For instance here’s a simple 3 parameter multivariate model with 10 chains, warmup=1900 and iter=2000 using default settings in Rstan (so adapt_delta=0.8). The final step sizes range from 0.018 to 0.024, and the acceptance ratio ranges from 0.88 to 0.94 across the 10 chains, clearly all are much higher than the target of 0.8.

It seems to me that the step sizes are all too small because of that short final adaptation window, seen in this plot:

If I increase adapt_term_buffer to 1000, obviously extreme, then the final step sizes range from 0.027 to 0.035, and acceptance ratio ranges from 0.82 to 0.86. So bigger step sizes, which shifts closer to the target by about 0.6.

I’m not proposing a terminal buffer size of 1000. I just noticed this behavior and it is unexpected from a user (specify target of 0.8 and get >0.9), and it appears closely related to this discussion topic and how the step size changes during the early part of an adaptation window. It maybe worth considering this aspect of the adaptation when considering alternatives, or at a minimum increasing the default for adapt_term_buffer to be larger.

5 Likes