Possibility of using dual averaging technique for the whole sample (not only during warm up)

H.Nik · March 26, 2018, 9:00pm

Dear experts,

Is it possible to use the dual averaging algorithm in HMC and NUTS for the whole sample size?
(if we want to keep the acceptance rate till certain value)

Does it cause any problem in ergodicity or disturbing the Markov chain’s stationary distribution?

Any help would be greatly appreciated.
Best,
H.N

sakrejda · March 27, 2018, 2:36am

What are you actually trying to do?

H.Nik · March 27, 2018, 3:50pm

I’m trying to keep the acceptance rate of HMC around certain value (0.65) for the whole sample which is relevant to controlling the number of model calls in my problem.

Is it disturbing the Markov chain’s stationary distribution by any chance?

sakrejda · March 27, 2018, 3:54pm

Yes because Stan can no longer calculate the correct acceptance statistic so the forward and reverse simulation in the integrator no longer represents a reversible path (that may not be exactly the right term). You can set adapt_delta (I think that’s the one) to a variety of values and that should get you close to what you want.

H.Nik · March 27, 2018, 4:21pm

Thank you very much for your helpful point, Sakredja.

I have another question which is irrelevant to my first question. but I’m very curious to know the answer.
we have a 1-D likelihood function (with mean zero) and multivariate standard normal prior.
If we have a sequence of sigma for my likelihood function (basically shrinking the sigma of likelihood from 1 to 0.3 using exponential decay), is it also disturbing the Markov chain’s stationary distribution?

Does it change the mean of posterior distribution?

We are just doing that to find/capture the sample points from our interested region.

Any help would be greatly appreciated.
Thank you!

sakrejda · March 27, 2018, 4:34pm

In general basically “yes”, but that brings back the original question: what are you actually trying to do! Please just start a new thread, it’ll get lost down here.

Bob_Carpenter · April 1, 2018, 6:27pm

The dimensionality of \theta in prior p(\theta) and likelihood p(y | \theta) should match.

The answer’s almost always “no” with an adaptation scheme unless done very carefully (which almost never matches intuitions about what a good method would look like). Specifically, you need to prove that any MCMC algorithm you devise preserves the correct stationary distribution (usually the posterior but for Stan, always the log density defined by the Stan program). Fair warning—it’s not easy, which is why NUTS was such a breakthrough.

The easiest way to do that is through detailed balance. Guessing isnt’ a good strategy in this business. The usual approach is to start with Metropolis and learn why that satisfies detailed balance, then go onto Metropolis-Hastings and Gibbs. Then basic HMC is just an instance of Metropolis-Hastings. You can then look at some of the adaptive Metropolis algorithms, which go in the direction you’re asking about. For NUTS, the Hoffman and Gelman paper does a good job explaining all the steps required for maintaining detailed balance.

H.Nik · April 2, 2018, 2:12am

Thank you so much for the helpful hint about the detailed balance.

Topic		Replies	Views
Confused about accept_stat__ and delta Algorithms mcmc	10	2288	August 21, 2019
HMC (jittered) vs. NUTS on 1000-dimensional standard normal Algorithms mcmc	9	4005	April 29, 2019
Jacobian adjustment in the acceptance probability Algorithms	11	4197	September 4, 2017
Momentum Diffusion HMC? Algorithms	40	4727	August 24, 2017
Issue with dual averaging Algorithms	63	4142	April 12, 2021

Possibility of using dual averaging technique for the whole sample (not only during warm up)

Related topics