Experiences with the zig-zag process MCMC?

Hi everyone,
I stubbled upon this new paper [1] published last year.
Though the preprint was published 2 years ago.
It talks about a new MCMC algorithm based on the zig-zag process.
I wonder if anyone has tried to implement this algorithm.
It claims higher efficiency compared to MALA (I wonder why they did not compare against HMC?) in both a standard and subsampling setting.

[1] Bierkens, Joris; Fearnhead, Paul; Roberts, Gareth. The Zig-Zag process and super-efficient sampling for Bayesian analysis of big data. Ann. Statist. 47 (2019), no. 3, 1288â€“1320. doi:10.1214/18-AOS1715.

2 Likes

This strikes me as one that Mr @betanalpha might have insights on.

1 Like

Because it doesnâ€™t compare to well-implemented Hamiltonian Monte Carloâ€¦

There have been a variety of gradient-based methods introduced since the advent of Hamiltonian Monte Carlo, but because none actually capture the right structure of the target typical set none have been able to outperform Hamiltonian Monte Carlo.

In one-dimension Zig-Zag works by making translations that exactly preserve the target distribution. In order to generalize to higher-dimensional spaces one has to choose a single direction in which to make the translation. The baseline implementation chooses one of the diagonals, hence the resulting Markov chain looks like a â€śzig zagâ€ť through the parameter space.

The problem is that the typical set that the Markov chain needs to explore isnâ€™t straight â€“ in simple cases itâ€™s more spherical. Consequently these translations can only go so far before they would start to leave the concentration of high probability mass. As the dimension increases the translations get shorter and shorter, the direction sampling gets more chances to point back in the direction of the previous point, and the algorithm starts to look more like a diffusion.

The Bouncy Particle Sampler is another algorithm that turns out to be essentially equivalent to the Zig Zag sampler and runs into a similar issue (as the dimensions increase the bouncing starts to dominate over the translations in between bounces).

When experimenting with a new algorithm I recommend trying to find some motivation for why it should work better and then validate that motivation with experiments. The reality is that most algorithms will not scale and so a motivation might not exist, but the experiments will help build your understanding as to why.

5 Likes

@betanalpha Thanks for the great insight!