Is it possible to specify the SD of proposed candidate distributions from which the samples are drawn in Stan/HMC like in the metropolis hasting algorithm? Thanks!

The proposals in HMC and Metropolis Hastings are quite different, and the set of next possible draws in NUTS is different still. This website has some neat animations: https://chi-feng.github.io/mcmc-demo/app.html

For more detail, the stuff described in here: https://arxiv.org/pdf/1701.02434.pdf is very close to what is currently implemented in Stan.

Thank you so much! One more question, so if I want to decide whether to use HMC/stan or MH for my model based on their efficiency, what criteria should I look at? The running time? The acceptance rate (but in MH I can adjust the SD to increase the acceptance rate and high acceptance rate is not always good for MH)? or other things? Thanks!

Effective sample size per second or effective same size per iteration is probably what you want.

The posterior package (here) exposes the same algorithm for computing ESS as Stan uses, so if you have something else you can compare the results to Stan using something like that.

(you’ll also want to check somehow that Stan and the other thing actually are returning the same thing)

Thank you for your reply! Is ESS the same as acceptance rate?

ESS is effective sample size – it is used analogously to the sample size you might use in a regular Monte Carlo estimator. Because in MCMC sequential draws are correlated, 1000 draws from MCMC isn’t the same as 1000 draws from a Monte Carlo estimator.

ESS is computed so that if you have an ESS of 570 from your MCMC sampler, that should be about the same as 570 draws from the distribution you want. This post links to the latest doc on our ESS: New R-hat and ESS

Thank you a lot!

More accurate statement would be: ESS is computed so that if you have an ESS of 570 for estimating E[f_1(\theta)] using your MCMC sample, that should be provide about the same accuracy as if estimating E[f_1(\theta)] using 570 independent draws from the distribution you want. It has been common to report ESS for estimating E[\theta] (ie given identity function), but it’s useful to remember that ESS for estimating, for example, E[\theta^2] can be very different. The new R-hat and ESS paper has been recently published in Bayesian Analysis https://projecteuclid.org/euclid.ba/1593828229.