Pseudo-extended MCMC

Thanks for taking the time to respond. I’m admittedly a bit of an Arxiv addict, but with that I am also fairly quick to discard papers that are incremental or overblown, so I only bring this paper up because I think 1) it doesn’t ring any of my Arxiv alarm bells 2) they implemented it in Stan, making it community relevant 3) my remaining concerns are best answered by this community.

If you are right that they are comparing with standard poorly tuned HMC I would agree that is a concern - since it’s implemented in Stan I admit I have assumed that they meant NUTS. They would have to be going out of their way not to use it.

Pseudo-samples is a hyperparameter, true, but it’s my impression that the difference in efficacy comes from each sample smoothing the density geometry at the cost of increasing the sampling space dimensionality. NUTS/HMC does mix slower in high-dimensional spaces right?

Edit: they are using NUTS throughout:
"We use the NUTS (Hoffman and Gelman, 2014) tuning algorithm for HMC as implemented
within STAN (Carpenter et al., 2017) for both standard HMC and pseudo-extended HMC, with
N = 2 pseudo-samples and both algorithms ran for 10,000 iterations, with the first half of
the chain removed as brun-in. "

Edit2: with respect to multimodal navigation being NP-hard, I agree that this obviously doesn’t resolve that; generalizing the intuition from figure 1 to higher dimensions, I think the result of the method is to connect the modes via a trellis-like structure. Since that trellis is combinatorial in nature, it’s still NP to navigate