Grid based gaussian process speedup

Hello,

I have data sampled on a 2D regular rectangular grid, and would like to use Gaussian process methods for modelling.
My data can potentially be very large, so I’d like to use some kind of speedup or approximation. I’ve seen a few papers/articles/posts mention that regular grids allow for speedup of gaussian process calculations

The math is all very much over my head.

Is this method applicable in 2D input space? The link above on toeplitz matrices seems to indicate that it is not.

Are there any examples of grid based GPs in Stan that I could refer to?

If grid based speedup doesn’t work for my problem, then I can use the HSGP method instead. I’m just curious if there are any resources.

Thank you,
Scott

How large? Knowing that would help to provide better recommendations on what to do. Also it would be helpful to know something about the covariance function, such whether it’s stationary and what is the expected length scale compared to the data range.

If the observation model is also Gaussian and the covariance function is stationary, you could use Kronecker product approach. However, implementation of these is a bit complex, and you don’t get the maximum benefits due to how Stan’s autodiff works.

HSGP is easy to implement, easy to use with non-Gaussian observation models and as part of multi-latent models, and other more elaborate models. It provides a good speedup if the latent functions can be assumed to be smooth (not very short length scale). If you tell more about your modeling problem, I may be able to tell if some other approach might be better,

4 Likes

I’m elaborating here on @avehtari’s post and the underlying benefits of data that lives on a grid.

In order to evaluate the posterior mean in Gaussian process regression, (I am assuming Gaussian observation model), you need to solve a linear system of the form

(K + \sigma^2 I) \alpha = y

where y is observed data and K is an N \times N symmetric matrix such that K_{i, j} = k(x_i, x_j) where k is the covariance kernel of the Gaussian process and (x_1, y_1),…,(x_N, y_N) are observed data.

Suppose first that we are in 1 dimension and data locations x_1, x_2,…, x_N lie on an equispaced grid. When k is translation-invariant, that is k(x, y) is a function of x - y, then K is Topelitz (each diagonal contains only 1 unique value).

This matrix has some very desirable properties, but in this context probably the most important is that K can be applied to an arbitrary vector very quickly, in O(N\log{N}) time as opposed to the straightforward O(N^2). To see this, suppose that v \in \mathbb{R}^N. Then Kv satisfies

(Kv)_i = \sum_{j=1}^N k(x_i - x_j) v_j

for i=1,…,N. This computation is a discrete convolution — Convolution - Wikipedia — and can be computed using FFTs in O(N\log{N}) operations. The fast matrix-vector apply means that you can efficiently solve the above linear system iteratively (as long as the matrix K + \sigma^2I is well-conditioned, which is usually the case).

All of this follows through with data that lives on an equispaced grid in 2 dimensions (even though K is no longer Toeplitz). Matrix-vector multiplication is a 2d discrete convolution, which can be performed in O(N \log{N}) operations using 2-dimensional FFTs.

4 Likes

fft and inv_fft are going to be in the upcoming 2.30 release. @pgree would you have any interest in putting together a simple example of using this for these Toeplitz GPs? I believe the release candidate will go out today (or very soon after if there are hiccups). We could probably add it to the Stan Users Guide.

3 Likes

Hey folks,

Thank you for the response, I really appreciate how helpful the Stan devs and community are.

How large? Knowing that would help to provide better recommendations on what to do. Also it would be helpful to know something about the covariance function, such whether it’s stationary and what is the expected length scale compared to the data range.

@avehtari My data is a set of binary images, ie each pixel is either 0 or 1. Each image can have as many as ~2000 to ~8000 pixels. My data will initially contain a small set of ~30-60 of these images. I want a model to learn to classify if each image contains a specific shape from this training set, then run this classification model on a larger set of images (up to ~10,000) to identify other images that contain the same pattern. Repeat this process for many different shapes. There are more complications regarding the kinds of shapes I am looking for, but this describes the gist of my problem. GP is attractive because the initial training set needs to be hand-labelled, so training the model with a small amount of data is very useful.

I’m not exactly sure what my speed requirements are. After reading more on Toeplitz matrix methods, I am leaning more and more towards HSGP since the Toeplitz methods are quite complicated, and it looks like this hasn’t been done in Stan before (likely due to FFT and circulant approximation requirements). I am surprised that I wasn’t able to find a clear tutorial of Toeplitz methods for GP speedup, since it seems like this method is applicable for image processing; maybe I didn’t search the literature in the right places.

If the observation model is also Gaussian and the covariance function is stationary, you could use Kronecker product approach. However, implementation of these is a bit complex, and you don’t get the maximum benefits due to how Stan’s autodiff works.

The observation model is binomial, not Gaussian; covariance function is stationary.

This matrix has some very desirable properties, but in this context probably the most important is that K can be applied to an arbitrary vector very quickly

@pgree I was under the impression that the Toeplitz methods were primarily useful for 1) inverting covariance matrix and 2) computing the log determinant of the covariance matrix ([1511.01870] Thoughts on Massively Scalable Gaussian Processes). Am I wrong here? Maybe that’s the use case for non-Gaussian observation models

Thank you all again for the helpful responses.
Scott

I don’t see how you would use GPs here. When you earlier said “I have data sampled on a 2D regular rectangular grid,” I assumed you would do spatial modeling/smoothing, but GPs are not very good for modeling of images. Now you say “classify if each image contains a specific shape from this training set” and plain GPs are not very good for making image classification from pixel value data (although people have done that for demonstration purposes at least with MNIST digit and fashion data sets with 28x28 pixel images).

If you have the access to those 10,000 images, then you should do some semisupervised/self-supervised convolutional deep learning (or convolutional GP) model that would help to learn a smaller dimensional latent space of the shapes, and then it might be possible to learn a classifier with just 30 images, and then GPs can also be useful. All this goes beyond Stan’s comfort zone.

1 Like

Thank you for the reply. I’m sorry, I think I’ve used some bad terminology here. They’re not really images like from a camera, I have just gotten used to thinking of this problem in terms of images. The grid dimensions have well defined spatial lengths and there is a continuous function underlying the patterns I’m looking for. I’m sorry for being vague, I am just unable to share too much of what I’m working on. I have been considering a neural network approach but I think the kinds of patterns I am interested in are simple enough to do with GPs. Anyway, for now I think I’ve decided not to pursue the Toeplitz method, although I am interested in seeing if someone is able to implement this Stan using FFT.

Thanks for the advice folks

Did anyone end up writing up an example of this?