Zero-inflated probability distribution

These are very, very important questions!

Exactly!

Yes! As you note probability density functions are defined by the Radon-Nikodym theorem, which always involves two measures. In practice the symmetry of those two arguments is often broken by taking one of those measures be a “base” measure (or dominating measure), but that’s a consequence of how we use the Radon-Nikodym theorem not the theorem itself.

Unfortunately the base measure is often taken for granted, which is why it’s so easily forgotten. People say “probability density function of a distribution \pi” instead of “probability density function of a distribution \pi with respect to the base measure \nu”. But it’s this choice of base measure that determines for example how probability density functions transform; see for example my comment at Why transformations need to be invertible in change of variable in probability theory? - #11 by betanalpha.

Not without imposing additional constraints on any given problem.

Given a probability distribution \pi there will in general be an infinite number of measures \nu that dominate \pi and hence serve as a valid base measure. In some cases the structure of the ambient space over which \pi is defined motivates particular measures, but those measures are not unique in their ability to help represent \pi.

For example the structure of discrete spaces naturally motivates the counting measure. Probability density functions with respect to the counting measure are also known as probability mass functions. The counting measure has the nice property of being uniform over the countable elements of any discrete space, and in some sense it’s the notion of uniformity that elevates it above any other measure. This uniformity is also invariant to 1-1 transformations – on a discrete space any 1-1 transformation can be decomposed into permutations, and permutations don’t affect the counting measure.

On real spaces the equivalent notion of uniform is given by the Lebesgue measure, and indeed when someone says “probability density function” they almost always mean “probability density function with respect to the Lebesgue measure” whether they are aware of it or not!

Unfortunately the Lebesgue measure doesn’t transform as nicely as the counting measure. Under a general 1-1 transformation or “reparameterization” \phi: X \rightarrow Y the Lebesgue measure on X does not transform into the Lebesgue measure on Y! This is because the Lebesgue measure is defined by the local metric structure of a real space and that structure isn’t preserved under non-linear transformations; another way to think about it is that there isn’t one real space but rather an infinite number of them, each with their own Lebesgue measure.

Consequently when we transform from X to Y we have two different probability density functions that we might consider! There’s the Radon-Nikodym derivative of the transformed distribution \phi_{*} \pi with respect to the Lebesgue measure on X transformed to a measure on Y, \phi_{*} \mathcal{L}_{X},

\pi_{1}(y) = \frac{ \mathrm{d} \phi_{*} \pi }{ \mathrm{d} \phi_{*} \mathcal{L}_{X} } (y),

and the Radon-Nikodym derivative of the transformed distribution \phi_{*} \pi with respect to the Lebesgue measure on Y,

\pi_{2}(y) = \frac{ \mathrm{d} \phi_{*} \pi }{ \mathrm{d} \mathcal{L}_{Y} } (y).

Because \phi_{*} \mathcal{L}_{X} and \mathcal{L}_{Y} are in general different measures these are different functions! In fact the difference between these functions is exactly the “Jacobian determinant” that shows up in discussions of “change of variables”.

It’s only be explicitly recognizing what base measure is being considered that we can make sure that we’re doing this bookkeeping correct. Otherwise we’re just trying to pattern match poorly-motivated rules!

This question is why I yell so much about these seeming irrelevant mathematical technicalities.

Probabilistic computational algorithms estimate expectation values of a given probability distribution. Probability density functions with respect to various base measures can be used to evaluate these expectation values with integrals (sometimes known as the “Law of the Unconscious Statistician” but statisticians are weird) but the density functions themselves don’t define them. In particular the expectation values will always be the same no matter which base measure we choose.

In other words any well-defined probabilistic computational method should depend only on the invariant properties of a probability distribution! Any dependence on a particular probability density function is an undesirable artifact of the method or its implementation!

With this insight it becomes much easier to analyze probabilistic computational methods. For example Monte Carlo and Markov chain Monte Carlo use samples drawn from a probability distribution which have nothing to do with any base measure or probability density function! Once the samples have been generated Monte Carlo and Markov chain Monte Carlo estimators are completely independent of densities. On the other hand how the sample generation is implemented can very well depend on a particular chosen base measure, and that dependence manifests in awkward tuning problems for the method.

Ultimately probability density functions (with respect to a particular base measure!) are useful ways to represent probability distributions in practice, but only at the cost of introducing irrelevant structure that we have to make sure does not corrupt our probabilistic computations.

I emphasize many of these points in my writing – checkout in particular Probability Theory (For Scientists and Engineers), Rumble in the Ensemble, and Probabilistic Computation if you haven’t had a chance yet.

2 Likes