Define densities with no direct analytical expression in STAN


suppose I have a convolution of an exponential random variable X with unknown parameter \lambda and a normal random variable Y with unknown mean \mu (the latter for the sake of simplicity with known unit variance). The two unknown parameters should be estimated from i.i.d data.

Now, I know that in this case the convolution can be written in an explicit form. However, I am interested in learning something about the implementation of integrals in STAN. Hence, my goal is to implement this model in STAN without usage of the closed form expression; only using the expression:

f(z|\lambda, \mu)=\int_0^{\infty}\lambda e^{-\lambda v}\frac{1}{\sqrt{2\pi}}e^{-0.5*(z-v-\mu)^2}\mathrm{d}v

I started off by writing a function which approximates the integral via importance sampling, i.e. drawing N exponential random variables v_1, \dots v_N with parameter \lambda and then computing the average of \frac{1}{\sqrt{2\pi}}e^{-0.5*(z-v_i-\mu)^2}. However, I realized that I can only accomplish this in STAN by declaring the function with the rng suffix. But if I follow this declaration, I can only use this function in the generated quantities block.

According to the manual:
“Functions that call random number generating functions in their bodies must have a name that ends in _rng…user-defined functions with names that end in _rng may be used only in the generated quantities block and transformed data block, or within the bodies of user-defined functions ending in _rng”

So my question: How is it possible to implement such a density in STAN? (via importance sampling or numerical integration in general) Is there perhaps a way to write this function in another language (R, Python etc.) and use this within STAN?

Not with importance or any other kind of sampling. The HMC-based MCMC algorithms essentially assume that the log-kernel is evaluated deterministically. You could perhaps do some quasi-Monte Carlo stuff by passing a grid of values for v into the data block and summing your integrand over them, but I would not be surprised if the MCMC does not go well.

Stan 2.19 has a good implementation of numerical integration in one dimension and I suspect that would be a better choice.

The trapezoid rule works fine for integration if you’re talking about 1-D and you’re ok working with an approximate integral.

At its core Stan is just a way of specifying integrands and estimating the corresponding integrals. Consequently when you can always let Stan implement the convolution for you by introducing a latent parameter,

parameters {
  real<lower=0> v;
  real z;

model {
  v ~ exponential(lambda);
  z ~ normal(v + mu, 1);

That said, when the convolution is only one dimensional then it’s often more efficient to estimate it via quadrature, hence the recent inclusion of the integrate_1D function into Stan. Two dimensional quadrature is much harder, and past three dimensions letting the MCMC handle the convolution is your best bet.

1 Like

Thank you all very much! I need to look as to whether I can formulate my specific problem in terms of latent parameters. Otherwise, the numerical 1D integration seems fine (I did not know about such an implementation in STAN; could it cause any problems if I have an unbounded interval of integration?). Also good to know that the usage of importance sampling in the form I intended to use it, is not possible.