I would really like to understand this step more. Specifically, how it is possible to sample from this integral over tau without actually evaluating the integral? Is this a general way to evaluate compound distributions in Stan?

The reason I am asking is because I want to estimate a compound distribution of a von Mises with a Gamma (i.e., simply replace the Normal with a von Mises). This is the model presented in van den Berg et al. (2012)
(equation from the supplemental materials).

I guess this does not work for the von Mises, because the von Mises is not a member of the location-scale family, but maybe there is a way.

In other words, if you run a Markov chain over the joint space (x_1, x_2) and then use samples from only one component to estimate expectations then we are implicitly marginalizing over the other component! This means that you can compute convolutions like the scale mixture in the case study by building a Stan program for the joint model and let the integration happen implicitly.

Thanks for your response. I tried to format it more nicely to understand it easier. I hope this is what you intended to say. I will have to think a little bit about what this means for my case (if I have more questions, I will formulate them more clearly).

The βn = 1β and βNβ are subscripts and superscripts for the summation, respectively. It may help to write out the marginalization explicitly like

Thanks, I think I got it now. The thing that was difficult to understand for me, was the idea of approximating the integral you are interested in when simultaneously sampling from the posterior. In other words, the step of also only approximating the integral one is interested in in the same manner one approximates the posterior.

One more thing. I also discussed this issue with Quentin Gronau (@Quentin) and he send me the following code suggesting that maybe the alternative implementation to the Cauchy requires s^2 and instead of s. This becomes apparent when changing s from 1 to e.g., .1

n <- 1e5
m <- 0.3
s <- 1
xa <- rnorm(n)
# 1
xb <- rgamma(n, 1/2, s/2)
x <- m + xa / sqrt(xb)
hist(x[abs(x) < m + 15*s], probability = TRUE, breaks = 50)
plot(function(x) dcauchy(x, m, s), add = TRUE, xlim = c(m - 15*s, m + 15*s))
# 2
xb2 <- rgamma(n, 1/2, (s^2)/2)
x2 <- m + xa / sqrt(xb2)
hist(x2[abs(x2) < m + 15*s], probability = TRUE, breaks = 50)
plot(function(x) dcauchy(x, m, s), add = TRUE, xlim = c(m - 15*s, m + 15*s))