Fast Gaussian Processes: kernel interpolation + Kronecker utilisation

Has anyone thought about implementing the methods described in Kernel Interpolation for Scalable Structured Gaussian Processes (KISS-GP) or Thoughts on Massively Scalable Gaussian Processes?

These papers describe a methodology which reduces the O(n^3) learning complexity to O(n), and makes the prediction complexity O(1) - with potential scaling to millions or billions of points.

Basically it requires implementation of Kronecker matrix algebra including fast matrix-vector multiplies.

Looks really impressive…

4 Likes

Not a full implementation of KISS-GP nor MSGP, but Rob Trangucci’s has example implementations of the Kronecker trick here.

2 Likes

I’ve only thought about someone else implementing them. Pity they didn’t implement these directly in Stan.
More seriously: Having these would be great, but a big project to implement in Stan (bigger than writing a demo code demonstrating that the methods work). There are plans to add simpler speed-ups first.

There are however limitations, which they also mention at some point in the papers (it would so much nicer to mention the limitations in the first page). 1) Kronecker sets strict limitations on the covariance functions to be used making it less generic. 2) The number latent values grows exponentially with number of dimensions, and the say that it’s unlikely to work beyond 5 dimensions, and if I remember correctly they have demonstrated it only in 2D (or maybe 3D in another paper). For higher dimensions this needs some dimension reduction. 3) They don’t do full Bayesian inference and in case of non-Gaussian likelihoods they use Laplace approximation to integrate over the latent values. For Stan billion latent values would be a problem.

In addition of these basics (which Stan would hopefully have at some point) it’s bit more complicated to make it fast for Stan, but definitely it would require these first (which would be helpful in other cases, too).

1 Like

What speedups are you referring to here?

The second paper discusses your point #2 here - using a form of dimensional reduction to deal with this issue.

Note that since this community project, there is no specific road map, but based on public pull requests, issues, and discourse forum discussions, my guess is that before KISS-GP we’ll see some of these: GPUs, MPI, more internal covariance functions, Kroenecker, MRFs, basis functions for 1D, NNGP, unless someone volunteers to implement KISS-GP.

Yes, that’s why I mentioned it, too as it will limit it’s use as a generic approach (the limitations due to dimensionality reduction may be not so clearly stated in the paper). Of course there are also many cases where data is in a such lower dimensional manifold that it is relatively easy and sensible thing to do.