The typical set and its relevance to Bayesian computation

Talking about log p rather than the typical set has been very liberating, and now it is making me think of some other questions.

  1. One way to think of the typical set is in terms of level sets of log p. The typical set of theta corresponds to the level sets of log p(theta) near its expected value. But then this makes me thing of . . . slice sampling! In slice sampling, we’re implicitly reparameterizing the density in terms of level sets of p, which is mathematically equivalent to level sets of log p.

In particular, slice sampling has two steps: Draw theta from within the level set of constant p(theta), and draw a value of p(theta) given theta. That first step (draw theta given p(theta)) is a lot like HMC. This is no surprise–the original NUTS algorithm uses slice sampling–but at the very least it’s helping me understand what people have been talking about when they say “typical set.”

  1. Hamiltonian paths have constant energy: that’s potential energy (i.e., - log p) and kinetic energy (from the speed of the particle moving through space). In his famous 2011 article, Radford talked about the challenge of HMC that it stays in a narrow band of potential energy. HMC can move efficiently within the level sets of log p, but it has random walk behavior in the dimension of log p itself.

But this makes me wonder: are there some HMC diagnostics that would help us understand this? We’re already monitoring log p and have a sense of how it varies. But we also know the distribution of kinetic energy of the Hamiltonian paths. As Bob put it (https://statmodeling.stat.columbia.edu/2020/08/02/the-typical-set-and-its-relevance-to-bayesian-computation/#comment-1399953), “In HMC, we use the gradient not to literally climb toward the mode, but to exert force toward the mode on our simulated particle that’s balanced with the particles momentum to keep the flow in the volume where we want to sample.”

So it seems that we should have internal evidence from our HMC runs (in warmup as well as in sampling) as to how the sampler is moving within and between level sets of log p. I’m wondering if this sort of thinking could move us beyond talking about the typical set?

2 Likes