Generalized Ziggurat Algorithm for Unimodal and Unbounded Probability Density Functions with Zest

This might be a useful for adding more univariate rng’s, maybe efficient truncated distribution rngs and way to generate random number given just the user defined _lpdf.

Generalized Ziggurat Algorithm for Unimodal and Unbounded Probability Density Functions with Zest
by Jalalvand and Charsooghi

We present a modified Ziggurat algorithm that could generate a random number from all unimodal and unbounded PDFs. For PDFs that have unbounded density and/or unbounded support we use a combination of nonlinear mapping function and rejection sampling to generate a random number from the peak and/or the tail distribution. A family of mapping functions and their corresponding acceptance probability functions are presented (along with the criteria for their use and their efficiency) that could be used to generate random numbers from infinite tails and unbounded densities.
The Zest library which is a C++ implementation of this algorithm is also presented. Zest can efficiently generate normal, exponential, cauchy, gamma, Weibull, log-normal, chi-squared, student’s t and Fisher’s f variates. The user can also define their custom PDF as a class and supply it as a template argument to our library’s class without modifying any part of the library. Performance of Zest is compared against performance of random modules of (GCC’s implementation of) Standard Template Library (STL) and Boost. The presented results show that Zest is faster than both in most cases, sometimes by a factor of more than 10.
We also present a C++ implementation of a uniform floating-point random number generator (RNG) which is capable of producing all representable floating-point numbers in [0,1) which will be used in the Ziggurat algorithm near unbounded peaks and tails. The common method of dividing a random integer by the range of the RNG can not produce random floating-point numbers with fully random fraction bits and very small random numbers. The presented uniform floating-point RNG is very efficient and in the case of producing double precision floating-point numbers it’s even faster than simply multiplying a 64-bit integer by 2−64.

It is GPL so it would have to go as an rstan add-on. Also,

Finally it should be mentioned that although the Ziggurat algorithm is very fast, it has a long setup times. This is not a problem for Cauchy, normal and exponential distributions as every distribution of these kinds can be generated with shifting and scaling the corresponding
distribution with standard parameters. But for applications requiring log-normal, gamma,
Weibull, student’s t or Fisher’s f variates with frequently changing shape parameter, the
Ziggurat algorithm is not a suitable choice.

We could possibly implement the uniform PRNG off of the text of Section 4.

How efficient would it be to define inverse CDFs via our algebraic solvers? Then we could do all the unimodal ones pretty easily.

Even if we can’t use the code due to GPL, we could use the algorithms if they are useful.

Not efficient enough. However, the Boost root-finders would work nicely in the cases where the Boost distributions do not already have a quantile method (which we are not exposing currently).

We’d expose the inverse CDFs if we could figure out how to compute the derivatives.

With respect to the distribution’s parameters? That is the same logic as in the algebraic solvers, but the Boost solvers are more appropriate because they are

  1. Unidimensional
  2. Guard that the solution is bounded to the appropriate interval
  3. Can utilize second or third derivatives with respect to the unknown when finding the roote

Yes, w.r.t. all the arguments. If we can do that, let’s add them. Could you open a math issue with a hint as to how to proceed?

Not sure if this got answered elsewhere, but derivatives of quantile functions are easy if you can compute derivatives of the CDF. Quick derivation follows:

If inverse_cdf(cdf(x; θ); θ) = x, then by total differentiation
∂inverse_cdf(cdf(x; θ); θ)/∂θ + ∂inverse_cdf(cdf(x; θ); θ)/∂u * ∂cdf(x; θ)/∂θ = 0
∂inverse_cdf(cdf(x; θ); θ)/∂θ + pdf(x; θ)^{-1} * ∂cdf(x; θ)/∂θ = 0
∂inverse_cdf(cdf(x; θ); θ)/∂θ = –pdf(x; θ)^{-1} * ∂cdf(x; θ)/∂θ

This uses the identity ∂inverse_cdf(u; θ)/∂u = (∂cdf(x; θ)/∂x)^{-1} = pdf(x; θ)^{-1}.


I’m an author of the mentioned paper. I know this thread is very old but I thought I might be helpful here.

Knowing the inverse CDF is not necessary in this algorithm (the CDF itself is always needed to setup the Ziggurat algorithm). There are several cases here. One is that the distribution has compact support (i.e. the PDF is non-zero only in a finite interval) in which case the PDF is sufficient. If the distribution extends to infinity, there are several ways to generate random numbers from the tail distributions. If the inverse CDF is known it could be used to directly map uniform random numbers to the tail distribution. But there are also ways based on both mapping and rejection sampling that could do with the inverse PDF or just the PDF itself (but requires some mathematical work).

If you have questions regarding the paper or the code, please feel free to ask. I would be glad to help.

A separate (and optimized) implementation of the uniform PRNG is available in this repository:

As explained in there this improves the accuracy of other algorithms such as Box-Muller particularly in the tails.

If needed I could re-license the code under a more permissive license.

1 Like

Thanks for jumping onto the Stan forums and for the offer to help.

Stan models have support over all of \mathbb{R}^N. But they’re only an inverse logit and Jacobian away from having support on (0, 1)^N.

I took a brief look at the repo (thanks for writing a clear introduction!). We use the L’Ecuyer linear congruential RNG because it’s easy to advance. A new RNG would be pretty easy to plug in as long as it could advance easily like the linear congruential ones.

We use the implementation from Boost. We never found any difference in anything we did plugging in different RNGs, but then we didn’t go out looking for edge case trouble. I’m pretty sure Boost’s implementation uses 256 bit state and produces 64 bit output.

Usually we reparameterize rather than having to deal with numbers smaller than 2^{-32} much less 2^{-64} because of all the other numerical headaches they cause. The target is independent, unit-scale parameters, because then we don’t have to adapt and the posterior’s Hessian is well conditioned.

1 Like

Inverse logit can map them to (0,1)^N but the resulting compact distribution might have unbounded density at both 0 and 1 (e.g. try Cauchy distribution). Handling unbounded distributions requires mathematical work and could be inefficient. Inverse logit is suitable for light-tailed distributions (those that decay at least as fast as an exponential).

It’s actually a distribution and not a generator. It takes one (or rarely a few) random integer(s) from a user-selected RNG and turn it into a floating-point number between 0 and 1. It could be used with linear congruential RNGs or any other integer RNGs. The RNG should be specified as a template parameter. It’s up to the user to choose between a high-quality RNG or one with a small state or short period.

These inaccuracies manifest themselves far into the tail of distribution where the probability is very low anyway, therefore for many applications they could be overlooked. With large enough samples the deviation could be detected in a Kolmogorov-Smirnov test. How large is large enough depend on the distribution and the mapping function.

As an example try mapping a number between 0 and 1 to an exponential distribution with -log_2 (x) both for a true 32-bit floating-point number and a semi fixed-point one produced by dividing a 32-bit integer by 2^{32}. The first gap appears at 9 (where the next floating-point number smaller than 2^{-9} is 2^{-9}-2^{-33}, while the next semi fixed-point number would be 2^{-9}-2^{-32}). After taking the log this gap is quite small (about 10^{-7}) and the floating-point numbers only provides one additional number in this gap. The last gap of course appears between 31 and 32. The semi fixed-point numbers produce no number in this interval but the floating-point numbers can produce 2^{24} additional numbers in this interval. Also for semi fixed-point numbers there would be no exponential variates after 32. Again this is very far into the tail of the distribution but if you have really large sample the effects would be clearly observable.

Boost uses the Ziggurat algorithm for most of the exponential distribution and starts using log function only far into the tail so this problem occurs much further into the tail but the STL library as implemented by GCC uses log from the beginning and would encounter this problem sooner.

1 Like

Thanks for the clarification. That actually clears up a lot of this in my mind in that I never understood why everything couldn’t just be transformed down to [0, 1].

Couldn’t we then use the Cauchy cdf for heavier tails? I guess the tricky part’s doing this in a sensible multivariate way.

That’s a bit trickier for our architecture. We use the Boost PRNGs and they’re designed to generate integer values. Then those get transformed by all the specific PRNGs internally. Those would have to be rewritten.

It’d be easy enough to do with our randomization for HMC itself, but tails in the randomization isn’t an issue there.

Art Owen was suggesting we could do quasi-MCMC with short-period PRNGs, but we never tried it.

Thanks. I should add this all to the discussion of floating point, where I talk about asymmetry of differences around 1 and 0 and catastrophic cancellation and other basics of floating point.

1 Like

Yes. Rarely some distributions could be heavier-tailed than Cauchy but that’s rare and there are mappings for them as well. Ideally one should not use a heavy-tailed distribution’s CDF to map a light-tailed one.

Boost uses Ziggurat algorithm for exponential and normal distributions only. I should also correct a mistake: Boost uses the Ziggurat algorithm recursively for the tail of the exponential (due to the self-similarity of the distribution) rather than using the log function.