These papers describe a methodology which reduces the O(n^3) learning complexity to O(n), and makes the prediction complexity O(1) - with potential scaling to millions or billions of points.
Basically it requires implementation of Kronecker matrix algebra including fast matrix-vector multiplies.
I’ve only thought about someone else implementing them. Pity they didn’t implement these directly in Stan.
More seriously: Having these would be great, but a big project to implement in Stan (bigger than writing a demo code demonstrating that the methods work). There are plans to add simpler speed-ups first.
There are however limitations, which they also mention at some point in the papers (it would so much nicer to mention the limitations in the first page). 1) Kronecker sets strict limitations on the covariance functions to be used making it less generic. 2) The number latent values grows exponentially with number of dimensions, and the say that it’s unlikely to work beyond 5 dimensions, and if I remember correctly they have demonstrated it only in 2D (or maybe 3D in another paper). For higher dimensions this needs some dimension reduction. 3) They don’t do full Bayesian inference and in case of non-Gaussian likelihoods they use Laplace approximation to integrate over the latent values. For Stan billion latent values would be a problem.
In addition of these basics (which Stan would hopefully have at some point) it’s bit more complicated to make it fast for Stan, but definitely it would require these first (which would be helpful in other cases, too).
Note that since this community project, there is no specific road map, but based on public pull requests, issues, and discourse forum discussions, my guess is that before KISS-GP we’ll see some of these: GPUs, MPI, more internal covariance functions, Kroenecker, MRFs, basis functions for 1D, NNGP, unless someone volunteers to implement KISS-GP.
Yes, that’s why I mentioned it, too as it will limit it’s use as a generic approach (the limitations due to dimensionality reduction may be not so clearly stated in the paper). Of course there are also many cases where data is in a such lower dimensional manifold that it is relatively easy and sensible thing to do.