I conveniently just put out a case study on prior modeling that discusses these issues in detail, Prior Modeling. See in particular Section 4.1.4 and Section 5.

A probability density function is more formally defined as a *Radon-Nikodym derivative* that updates one probability distribution into another. Given two two probability distributions \pi and \rho defined over the same space (that play nicely with each other) we can recover expectations with respect to \pi from expectations with respect to \rho by

\mathbb{E}_{\pi}[f] = \mathbb{E}_{\rho} \left[ \frac{ \mathrm{d} \pi} { \mathrm{d} \rho } \cdot f \right].

Where the function \frac{ \mathrm{d} \pi} { \mathrm{d} \rho } is the Radon-Nikodym derivative or density between \pi and \rho. Intuitively the density function up-weights f in regions where \pi allocates more probability than \rho and down-weights f in regions where \pi allocates less probability than \rho.

Radon-Nikodym derivatives can also be chained. If we have three distributions \pi, \rho, and \phi then

\mathbb{E}_{\pi}[f]
= \mathbb{E}_{\rho} \left[ \frac{ \mathrm{d} \pi} { \mathrm{d} \rho } \cdot f \right]
= \mathbb{E}_{\phi} \left[ \frac{ \mathrm{d} \pi} { \mathrm{d} \rho } \cdot \frac{ \mathrm{d} \rho} { \mathrm{d} \phi } \cdot f \right].

Equivalently we have chain rule like result,

\frac{ \mathrm{d} \pi} { \mathrm{d} \phi }(x) =
\frac{ \mathrm{d} \pi} { \mathrm{d} \rho }(x) \cdot \frac{ \mathrm{d} \rho} { \mathrm{d} \phi }(x).

Whenever we multiply density functions weâ€™re actually updating one distribution into an intermediate distribution and then into a third final distribution. Critically, however, this requires that we multiply a density between \pi and \rho and a density between \rho and \phi; the densities have to be chained together consistently. Trying to multiply two densities between \pi and \phi, or even a density between \pi and \phi and \rho and \phi, doesnâ€™t make any mathematical sense.

The usual â€śprobability density functionsâ€ť that we encounter are actually the Radon-Nikodym derivatives between probability distributions over the real numbers and the uniform â€śLebesgueâ€ť measure. Multiplying two of these conventional density functions together doesnâ€™t really make any sense because as discussed just above they are density functions with respect to the same base distribution. This is why naive products often have weird, unexpected consequences.

Another place where Radon-Nikodym derivatives appear is in Bayesâ€™ Theorem: a properly normalized version of the likelihood function can be interpreted as the density function between the posterior distribution and the prior distribution. As I discuss in the case study multiplying two density functions together can then be interpreted as two independent likelihood functions, although again this isnâ€™t as useful as it might sound. Because one would be modeling a heuristic likelihood function directly, and not a full observational model that can be evaluated on some heuristic observation to give a likelihood function, the underlying assumptions and their consequences are difficult if not impossible to validate. The information encoded in a likelihood function and the domain expertise encoded in a prior model are not the same!