Estimating compound distributions without numerical integration

Henrik_Singmann · January 15, 2018, 8:45am

This post is a follow-up on the case study by Michael Betancourt on fitting Cauchy distributions. In it he present several alternative parameterizations of the Cauchy, the first of which I want to follow up on here. Specifically, he states that it can be reparameterized as a continuous mixture (or compound) distribution:

I would really like to understand this step more. Specifically, how it is possible to sample from this integral over tau without actually evaluating the integral? Is this a general way to evaluate compound distributions in Stan?

The reason I am asking is because I want to estimate a compound distribution of a von Mises with a Gamma (i.e., simply replace the Normal with a von Mises). This is the model presented in van den Berg et al. (2012)
(equation from the supplemental materials).

compound2

I guess this does not work for the von Mises, because the von Mises is not a member of the location-scale family, but maybe there is a way.

betanalpha · January 15, 2018, 2:02pm

Marginalization is an implicit consequence of Monte Carlo integration.

We use (Markov chain) Monte Carlo to approximation posterior expectations through samples,

\hat{f}{N} = \frac{1}{N} \sum{n = 1}^{N} f(x)
\approx
E_pi [ f ] = \int dx pi(x) f(x).

Now let x be two dimensional, x = (x1, x2), but the function whose expectation we’re taking depend only on x1. Then

\hat{f}{N} = \frac{1}{N} \sum{n = 1}^{N} f(x_1)
\approx
E_pi [ f ] = \int dx_1 dx_2 pi(x_1, x_2) f(x_1).

But now we can marginalize x_2 out of the exact integral to give

\hat{f}{N} = \frac{1}{N} \sum{n = 1}^{N} f(x_1)
\approx
E_pi [ f ] = \int dx_1 pi(x_1) f(x_1).

In other words, if you run a Markov chain over the joint space (x_1, x_2) and then use samples from only one component to estimate expectations then we are implicitly marginalizing over the other component! This means that you can compute convolutions like the scale mixture in the case study by building a Stan program for the joint model and let the integration happen implicitly.

Henrik_Singmann · January 15, 2018, 4:06pm

Thanks for your response. I tried to format it more nicely to understand it easier. I hope this is what you intended to say. I will have to think a little bit about what this means for my case (if I have more questions, I will formulate them more clearly).

betanalpha · January 15, 2018, 7:12pm

The “n = 1” and “N” are subscripts and superscripts for the summation, respectively. It may help to write out the marginalization explicitly like

E[f]
= \int dx1 dx2 pi(x1, x2) f(x1)
= \int dx1 [ \int dx2 pi(x1, x2) ] f(x1)
= \int dx1 [ pi(x1) ] f(x1)
= \int dx1 pi(x1) f(x1)

Henrik_Singmann · January 16, 2018, 11:41am

Thanks, I think I got it now. The thing that was difficult to understand for me, was the idea of approximating the integral you are interested in when simultaneously sampling from the posterior. In other words, the step of also only approximating the integral one is interested in in the same manner one approximates the posterior.

One more thing. I also discussed this issue with Quentin Gronau (@Quentin) and he send me the following code suggesting that maybe the alternative implementation to the Cauchy requires s^2 and instead of s. This becomes apparent when changing s from 1 to e.g., .1

n <- 1e5
m <- 0.3
s <- 1

xa <- rnorm(n)

# 1
xb <- rgamma(n, 1/2, s/2)
x <- m + xa / sqrt(xb)
hist(x[abs(x) < m + 15*s], probability = TRUE, breaks = 50)
plot(function(x) dcauchy(x, m, s), add = TRUE, xlim = c(m - 15*s, m + 15*s))

# 2
xb2 <- rgamma(n, 1/2, (s^2)/2)
x2 <- m + xa / sqrt(xb2)
hist(x2[abs(x2) < m + 15*s], probability = TRUE, breaks = 50)
plot(function(x) dcauchy(x, m, s), add = TRUE, xlim = c(m - 15*s, m + 15*s))

betanalpha · January 16, 2018, 3:15pm

Yup, that was a typo on my end. Thanks for pointing it out.

I’m updating the case study now.

Topic		Replies	Views
Tutorial on Monte Carlo EM and variants for MML and MMAP Algorithms	16	3721	October 22, 2018
Discrete mixture models - Latent parameter recovery General	1	398	November 10, 2020
Inferring Ising Model parameters Modeling	10	1507	July 26, 2023
Does the cauchy/uniform reparametrization extend to the half-cauchy distribution? Modeling fitting-issues	8	1131	August 15, 2022
Using 1D numerical integration for doubly-intractable distributions Modeling integrator	3	713	April 7, 2020

Estimating compound distributions without numerical integration

Related topics