Experiences with the zig-zag process MCMC?

Hi everyone,
I stubbled upon this new paper [1] published last year.
Though the preprint was published 2 years ago.
It talks about a new MCMC algorithm based on the zig-zag process.
I wonder if anyone has tried to implement this algorithm.
It claims higher efficiency compared to MALA (I wonder why they did not compare against HMC?) in both a standard and subsampling setting.

[1] Bierkens, Joris; Fearnhead, Paul; Roberts, Gareth. The Zig-Zag process and super-efficient sampling for Bayesian analysis of big data. Ann. Statist. 47 (2019), no. 3, 1288–1320. doi:10.1214/18-AOS1715.

2 Likes

This strikes me as one that Mr @betanalpha might have insights on.

1 Like

Because it doesn’t compare to well-implemented Hamiltonian Monte Carlo…

There have been a variety of gradient-based methods introduced since the advent of Hamiltonian Monte Carlo, but because none actually capture the right structure of the target typical set none have been able to outperform Hamiltonian Monte Carlo.

In one-dimension Zig-Zag works by making translations that exactly preserve the target distribution. In order to generalize to higher-dimensional spaces one has to choose a single direction in which to make the translation. The baseline implementation chooses one of the diagonals, hence the resulting Markov chain looks like a “zig zag” through the parameter space.

The problem is that the typical set that the Markov chain needs to explore isn’t straight – in simple cases it’s more spherical. Consequently these translations can only go so far before they would start to leave the concentration of high probability mass. As the dimension increases the translations get shorter and shorter, the direction sampling gets more chances to point back in the direction of the previous point, and the algorithm starts to look more like a diffusion.

The Bouncy Particle Sampler is another algorithm that turns out to be essentially equivalent to the Zig Zag sampler and runs into a similar issue (as the dimensions increase the bouncing starts to dominate over the translations in between bounces).

When experimenting with a new algorithm I recommend trying to find some motivation for why it should work better and then validate that motivation with experiments. The reality is that most algorithms will not scale and so a motivation might not exist, but the experiments will help build your understanding as to why.

5 Likes

@betanalpha Thanks for the great insight!

One more thing, I’ve read your paper about the unsoundness of subsampling HMC.
Do you have any opinion about more recent works on subsampling HMC?
HMC with energy conserving subsampling for example?

The energy conserving subsampling method is more robust in that it will control the bias, but only at the expense of significantly reduced performance. The problems with subsampling can’t be overcome with algorithmic intervention alone, so all you can do is move around the pathologies when the data aren’t sufficiently redundant.

That said using something like the energy conserving subsampling method, or any method with a proper correction, is the most robust way to explore how redundant your data might be as the performance will provide feedback regarding the structure of your data.

1 Like

Thanks for your comments!