The typical set and its relevance to Bayesian computation

betanalpha · April 12, 2021, 6:36pm

Because any single probability density function is not a unique representation of a probability distribution defined over the real numbers.

The concept of the “real numbers” as a unique space is kind of a lie; there are in fact an infinite number of different spaces of real numbers, each with their own metric and corresponding uniform (i.e. Lebesgue) measure. Probability density functions are defined relative to this uniform measure, and hence change whenever the metric changes. For any two real spaces related by a transformation without unit Jacobian determinant the metrics, and the corresponding Lebesgue measures, will not be the same, and consequently the probability density function of some fixed probability distribution will not be the same.

Unlike probability density functions, abstract probability distributions don’t rely on a reference measure. Under any nice transformation of the ambient space into itself, what is often called a reparameterization, probability distributions push forward to an equivalent probability distribution on the new space that encodes the same information. In other words expectation values and probabilities can all be evaluated in either the initial or the final space.

Any quantity derived from an abstract probability distribution will then transform just like that probability distribution, but any quantity derived from a probability density function will transform in much more complicated ways because it has to take into account that Jacobian correction (this includes for example modes, entropy, and the like). In particular if one tries to formalize the typical set over the real numbers via the entropy definition then that typical set won’t transform cleanly, instead one has to transform, work out the new uniform measure, then construct the new entropy, and then construct the new typical set.

Consequently the concentration of measure phenomena exhibited by all notions of the typical set are best formalized in terms of expectation values, which is exactly how these concepts can be generalized from the integers to not only the real numbers but also more sophisticated spaces like manifolds.

This isn’t a question of desire but rather explanation.

Any samples from any high-dimensional probability distribution will exhibit concentration of measure behavior, repelling away from the local mode and instead concentrating along extended surfaces that surround the local mode. This includes both exact/independent samples used in Monte Carlo and correlated samples (really sequences) used in Markov chain Monte Carlo. Demonstrations of the former are included in Probabilistic Computation.

Hamiltonian Monte Carlo is not designed to explore an explicit typical set. Indeed we never explicit work out any definition of a typical set when running the method. It is instead designed to explore an abstract probability distribution, and because probability mass concentrates into these surfaces Hamiltonian Monte Carlo follows.

The typical set concept is used to explain why random numbers, Markov chains, and the like seem to be repelled away from the mode, and why trying to introduce information from around the mode doesn’t help these algorithms proceed. The utility of the concept also proves useful when examining non-sampling based algorithms, as the behavior of quantities like KL divergences between two distributions can be related to the overlap of the corresponding typical sets.

Topic		Replies	Views
Explore different typical sets in HMC General mcmc	4	1274	December 6, 2021
HMC samples not from the posterior, but from the typical set? General	8	1739	May 7, 2019
Markov chain and typical set General	7	1144	April 10, 2020
Concentration of measure and typical sets General stan-math	29	4249	January 6, 2018
Sampling Subsets from a Set Modeling	1	413	August 17, 2021

The typical set and its relevance to Bayesian computation

Related topics