Grid based gaussian process speedup

I’m elaborating here on @avehtari’s post and the underlying benefits of data that lives on a grid.

In order to evaluate the posterior mean in Gaussian process regression, (I am assuming Gaussian observation model), you need to solve a linear system of the form

(K + \sigma^2 I) \alpha = y

where y is observed data and K is an N \times N symmetric matrix such that K_{i, j} = k(x_i, x_j) where k is the covariance kernel of the Gaussian process and (x_1, y_1),…,(x_N, y_N) are observed data.

Suppose first that we are in 1 dimension and data locations x_1, x_2,…, x_N lie on an equispaced grid. When k is translation-invariant, that is k(x, y) is a function of x - y, then K is Topelitz (each diagonal contains only 1 unique value).

This matrix has some very desirable properties, but in this context probably the most important is that K can be applied to an arbitrary vector very quickly, in O(N\log{N}) time as opposed to the straightforward O(N^2). To see this, suppose that v \in \mathbb{R}^N. Then Kv satisfies

(Kv)_i = \sum_{j=1}^N k(x_i - x_j) v_j

for i=1,…,N. This computation is a discrete convolution — Convolution - Wikipedia — and can be computed using FFTs in O(N\log{N}) operations. The fast matrix-vector apply means that you can efficiently solve the above linear system iteratively (as long as the matrix K + \sigma^2I is well-conditioned, which is usually the case).

All of this follows through with data that lives on an equispaced grid in 2 dimensions (even though K is no longer Toeplitz). Matrix-vector multiplication is a 2d discrete convolution, which can be performed in O(N \log{N}) operations using 2-dimensional FFTs.

4 Likes