The typical set and its relevance to Bayesian computation

I think if the log density goes down… they were towards the mode, and if it goes up they were in the tail… (remember lp is negative of the potential energy in HMC, this confuses me sometimes as well).

This is exactly the idea I was thinking for the “warmup phase” to figure out where the threshold should be… Just run an optimization algorithm N steps, then spawn off short MCMC runs to see if you’re at equilibrium lp yet. Then take the 0.01 quantile of a short chain that bounced around… and call it the threshold.

It’s not quite importance sampling. The idea of replica exchange (AKA “simulated tempering” AKA Umbrella sampling, AKA a lot of other things) is you run two chains in parallel, and then every so often you try to swap their states. The swap is allowed according to the typical Metropolis algorithm probability. In this case, since you’re sampling on “uniform in the high probability set” in one chain, and “diffusive MCMC in the proper posterior” in the other, you can pretty easily work out the metropolis acceptance probability. But from the perspective of the diffusive MCMC it’s more or less like “someone just proposed this uniform on the high probability set sample, now I’ll accept that with what’s equivalent to an importance weight… to get a sample from my posterior”

So the diffusive MCMC is just a way to kind of “warp” the “uniform on the high probability set” sample into an appropriate posterior sample in a way like when you warp the proposal distribution with importance weights. Except instead of providing weights, you accept swaps according to the weights. It’s a little like SMC (sequential monte carlo) reweighting.

Note here are some of the advantages:

  1. The white box sampler NEVER rejects a sample, since all locations in the set have equal probability, it just goes around bouncing off the walls always forward. The main place it will have problems is in tight funnely bits of the posterior, where it may hit the boundary all the time basically just rattling around in a narrow neck.

  2. The white box sampler doesn’t need ANY gradients yet it moves long distances. It is in fact an HMC sampler on a posterior with zero gradient everywhere (a uniform distribution on a complex strangely shaped set).

  3. The white box sampler can be run on models that have no gradient that is, models like likelihood free / ABC / agent based models where you’re just simulating a dataset and then comparing it to your data set of interest.

I think it’s really a neat idea and would love to work on it with people who are interested in this kind of thing.