Dual Averaging Explanations in NUTS

Hi, I am learning the Dual Averaging based on the NUTS paper in 2014. Could anyone help to explain why the updates in the equation (6) were defined as attached? What are the exact expressions of the dual or primal problems here? The paper by Nesterov (2009) had very clear expression of the primal or dual problems. The 2014 paper did not mention either. Particularly, why is the average of the x_{t+1} is defined as this? why was the update of x_{t+1} defined not as the relationship with x_{t} but with the summation of Hi. What is the expression between x_{t+1} and x_{t} then? Why is this update called dual averaging specifically?

Thank you for your help in advance.


Can you elaborate a bit, for instance show how the expressions from Nesterov (2009) for the primal/dual problems are clear, and how that is different from what is presented in Hoffman, Gelman (2014)?

I never went into the details of the step size optimization for HMC, but just above that passage, in the same page it is stated that x_{t+1} \leftarrow x_t - \eta_t H_t is so because it guarantees the convergence to zero of h(x) (defined as the expected value of H_t conditioned on x). That would be the answer to your question:

But I’m not giving you a real explanation. Another relevant paper cited there is Andrieu, Thoms (2008), I cannot explain it better without delving further into those two papers, but @andrewgelman and others here may be able to give an quick, informal explanation that makes this clearer.

Sorry I can’t be more helpful right now.

1 Like

I think it is still an open question to me :) Thanks

1 Like

I’m not sure if it helps but in the Stan source has separate expressions for dual and primal parts. I think s_bar_ is the average gradient (dual).

Rearranging equation (6)


which implies that


and now we can derive