Adapt_delta

andrewgelman · March 6, 2019, 5:24pm

I ran a Stan program and got this warning message:

There were 23 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup

I went to the webpage and read the description, which was great. And then I re-ran my code setting adapt_delta:

fit <- stan(“chickens.stan”, data=data, control=list(adapt_delta=0.9))

And it worked fine.

Here’s my question. If our first recommendation is to increase adapt_delta, why not do this automatically? As a user, I’d find that convenient.

S_mengyuan · April 21, 2020, 8:06pm

increase adapt_delta means take smaller step to approach, it will take longer time at the same time.

abartonicek · April 22, 2020, 12:11am

I’m still a beginner with Stan too, but as far as I understand, adapt_delta is the average probability of accepting a posterior draw. The probability of accepting a posterior draw is related to step size - how far the sampler “jumps” on each draw. To increase probability of acceptance, the sampler needs to decrease step size, and take smaller, more careful steps.

If you imagine the posterior (or the typical set) as a tall hill in the middle of a flat plain, then what a Monte Carlo sampler does is it tries to map out the shape of the hill by taking random steps around the hill & measuring height at each step. Additionally, it only takes a step if the elevation is higher in the next location, or takes the step with probability = (next location elevation / current location elevation) when the elevation of the next location is lower. If the steps that the sampler takes are very big, it will often miss or “overshoot” the area of higher elevation, and as such it’s acceptance rate will be lower. If the sampler takes very small steps, its acceptance rate will be high but it will be very slow and take a long time to explore the hill.

S_mengyuan · April 22, 2020, 8:55am

same as my understanding, like a learning rate in deep learning.

mikegee · March 7, 2023, 10:01pm

To continue this thread,

If one’s ESS is low, instead of increasing iter, one could decrease adapt_delta as a means of reducing the autocorrelation of when a new proposed HMC value is accepted. This comes at the cost of reducing the rate at which the sampler moves.
In addition, there is a direct computational advantage in reducing adapt_delta in that the calculation time for generating the sample via leapfrogging is reduced.
Further, if one were naively interested in optimizing adapt_delta to a given problem, theoretically one would want to maximize the rate at which ESS increases with wall time given a particular set of computational resources.

Is my understanding for 1-3) correct?

Topic		Replies	Views
Much lower effective sample size with 2.15 Developers	9	1476	April 26, 2017
What acceptance probability does `adapt delta` target Algorithms	5	782	March 7, 2023
Automagically increase `adapt_delta` until all divergences are eliminated, what could go wrong? Modeling	17	1043	June 28, 2021
How to Reduce Divergent Transitions for Non-linear Hierarchical Model? Modeling rstan , techniques , fitting-issues , divergences	4	974	October 22, 2020
Text for warning message General	46	6010	August 11, 2020

Adapt_delta

Related topics