Experiences with the zig-zag process MCMC?

Red-Portal · March 16, 2020, 7:45am

Hi everyone,
I stubbled upon this new paper [1] published last year.
Though the preprint was published 2 years ago.
It talks about a new MCMC algorithm based on the zig-zag process.
I wonder if anyone has tried to implement this algorithm.
It claims higher efficiency compared to MALA (I wonder why they did not compare against HMC?) in both a standard and subsampling setting.

[1] Bierkens, Joris; Fearnhead, Paul; Roberts, Gareth. The Zig-Zag process and super-efficient sampling for Bayesian analysis of big data. Ann. Statist. 47 (2019), no. 3, 1288–1320. doi:10.1214/18-AOS1715.

emiruz · March 16, 2020, 6:52pm

This strikes me as one that Mr @betanalpha might have insights on.

betanalpha · March 18, 2020, 6:23pm

Because it doesn’t compare to well-implemented Hamiltonian Monte Carlo…

There have been a variety of gradient-based methods introduced since the advent of Hamiltonian Monte Carlo, but because none actually capture the right structure of the target typical set none have been able to outperform Hamiltonian Monte Carlo.

In one-dimension Zig-Zag works by making translations that exactly preserve the target distribution. In order to generalize to higher-dimensional spaces one has to choose a single direction in which to make the translation. The baseline implementation chooses one of the diagonals, hence the resulting Markov chain looks like a “zig zag” through the parameter space.

The problem is that the typical set that the Markov chain needs to explore isn’t straight – in simple cases it’s more spherical. Consequently these translations can only go so far before they would start to leave the concentration of high probability mass. As the dimension increases the translations get shorter and shorter, the direction sampling gets more chances to point back in the direction of the previous point, and the algorithm starts to look more like a diffusion.

The Bouncy Particle Sampler is another algorithm that turns out to be essentially equivalent to the Zig Zag sampler and runs into a similar issue (as the dimensions increase the bouncing starts to dominate over the translations in between bounces).

When experimenting with a new algorithm I recommend trying to find some motivation for why it should work better and then validate that motivation with experiments. The reality is that most algorithms will not scale and so a motivation might not exist, but the experiments will help build your understanding as to why.

Red-Portal · March 19, 2020, 4:00am

@betanalpha Thanks for the great insight!

One more thing, I’ve read your paper about the unsoundness of subsampling HMC.
Do you have any opinion about more recent works on subsampling HMC?
HMC with energy conserving subsampling for example?

betanalpha · March 19, 2020, 9:57pm

The energy conserving subsampling method is more robust in that it will control the bias, but only at the expense of significantly reduced performance. The problems with subsampling can’t be overcome with algorithmic intervention alone, so all you can do is move around the pathologies when the data aren’t sufficiently redundant.

That said using something like the energy conserving subsampling method, or any method with a proper correction, is the most robust way to explore how redundant your data might be as the performance will provide feedback regarding the structure of your data.

Red-Portal · March 30, 2020, 8:16am

Thanks for your comments!

Topic		Replies	Views
New algorithm: Gradient-based Adaptive Markov Chain Monte Carlo Algorithms mcmc	13	2112	February 16, 2022
New Algorithm: Newtonian Monte Carlo paper Algorithms mcmc	7	2129	February 7, 2020
Tried to Speed Up Stan (HMC) by Rewriting Dynamics, failed Algorithms	2	680	June 17, 2020
Pseudo-extended MCMC General	35	4103	November 14, 2018
NUTS inside a Sequential Monte Carlo sampler Algorithms	12	1907	August 10, 2021

Experiences with the zig-zag process MCMC?

Related topics