Practical Hilbert space approximate Bayesian Gaussian processes for probabilistic programming

We have a new paper Gabriel Riutort-Mayol, Paul-Christian Bürkner, Michael R. Andersen, Arno Solin, Aki Vehtari (2020). Practical Hilbert space approximate Bayesian Gaussian processes for probabilistic programming. https://arxiv.org/abs/2004.11408 with code in Stan and brms

The approximation itself is not new as it is based on the Hilbert space basis function approach by Solin and Särkkä https://doi.org/10.1007/s11222-019-09886-w

The present paper adds

  • analysis of the performance and diagnostics in relation to the number of basis functions, domain of the prediction space, and smoothness of the latent function
  • practical implementation details for Stan

In addition of simulations, we re-create two models from BDA3 (which were made with GPstuff):

  • birthday model with 7305 observations in one dimension
  • leukemia survival model with 1043 observations in four dimensions with full interactions

In addition of having code for Stan at https://github.com/gabriuma/basis_functions_approach_to_GP, this approximation has been for some time in brms https://rdrr.io/cran/brms/man/gp.html, but we haven’t been advertising it much before we got the paper ready.

When GPs are implemented using covariance matrices, the computation cost is O(n^3) for every log density evaluation. This reduced rank approximation has pre-computation cost O(m^2n) for m basis functions, but after that the computational cost for log density evaluations is O(mn + m). The number of basis functions needed depends on the wigglines of the target and the number of dimensions. The scaling in dimensions is exponential restricting the practical use for 1-4 dimensions with full interactions. With additive models the scaling is linear in the number of additive components. The additional benefits of the basis function approximation in Stan is the reduced size of the autodiff tree and that we can use compound glm functions in for additional speedup.

EDIT: added a paragraph on computational cost.

16 Likes

Hi Aki, I have a quick question about this paper. I am try to trying to implement the approximation for an exponential covariance function (\nu = 1/2), which isn’t explicitly presented in the paper


But, it should be pretty straightforward to substitute into the formula for Matern class spectral density. When I did this, however, I noticed that the 4\pi^2 factor seems to be missing from equations (1), (2), and (3). Looking at some of the reference materials I also noticed that Rasmussen and Williams use one convention of the Fourier transform with the 2\pi in the power of the complex exponential and Solin and Särkkä use a convention with a \frac{1}{2\pi} factor outside the integral.

I did a little testing and the method in the Riutort et al. methods works with the equations (1), (2), and (3) as presented but seemingly not if one re-derives them using the Rasmussen and Williams spectral density equation and keeps the 4\pi^2 factor.

Can you provide any clarity here?

Sorry, these are not things that get stuck in my memory and I don’t have now time to rederive these. Our equations were checked by Solin (also a co-author), so I trust them. You could email Solin for clarification (he has been thinking these more than I, so might remember the reasons for different looking equations)

No worries. I’m fairly sure it just has to do with the convention of the Fourier transform used. Looking at the \nu=\infty case using the Rasmussen and Williams convention:
S(\omega) = \int_Re^{r^2/2l^2}e^{-2\pi i\omega r}dr=\sqrt{2\pi}le^{-2\pi^2l^2\omega^2} which matches the formula given in the textbook (pg.83)
Using the Solin and Särrkä convention:
S(\omega) = \int_Re^{r^2/2l^2}e^{-i\omega r}dr=\sqrt{2\pi}le^{-1/2l^2\omega^2} which matches the formula in the Riutort et al. paper.

So it seems like for the general case which I wasn’t able to verify the integration for one substitutes \omega = \omega/2\pi in the Rasmussen and Williams equation to get equations (1) - (3).

1 Like