GP Basis Function Approximations for Product Kernels

I’ve recently been working through the paper Practical Hilbert space approximate Bayesian Gaussian
processes for probabilistic programming
, and the accompanying github repo.

Comparing the implementation of the birthday problem to that presented in BDA3 I noticed that this implementation excludes the weekly quasi-periodic term included in BDA3: which is the product of a square-exponential and periodic kernel.

I understand that this kernel is not directly amenable to the basis function approach as it is non-stationary, but would like to check my understanding of why we cannot combine the BF approximation for the SE kernel with the stochastic resonator approximation to the periodic kernel (described in Appendix B).

My current understanding is that the efficiency gain in both approximations is that both enable us to write the GP in the latent variable form

f = \Phi \Delta^{1/2}\beta

(adopting the notation from the paper, p.6-7) where only \Delta depends on the GP parameters - so in particular for each sampled set of parameters we do a single matrix multiplication.

On the other hand - the usual latent variable implementation of a GP involves repeatedly performing a Cholesky decomposition for each draw of the parameters - which is more computationally expensive than the matrix multiplication.

Is this interpretation of the efficiency boost correct?

Bringing this back to the product of kernels - my understanding is that we cannot identify the spectral decomposition of the product from that of the two kernels - meaning that even if we worked with the approximations we would still be forced to multiply and then perform a Cholesky decomposition, hence losing any efficiency gain?

Hopefully that’s clear - and would appreciate views on whether this is an accurate heuristic of the method.

Thanks!

Quasi-periodic was not used as it could require quite many basis functions (if the range is the whole time series) or the same basis functions but different coefficients for different weeks and time series model for those coefficients. With the latter approach the number of coeffiecients would the number of weeks times times the number of basis functions. Of course you use basis function approximation to model the the time series of of the coefficients, but it’s more complicated to make a multivariate time series model.

Non-stationarity is not a problem for basis functions. A linear model is non-stationary and also a perfect basis function presentation of GP with dot-product covariance function.

I’m not aware of how we could do (general) quasi-periodic time series GP model case efficiently with basis functions. I do now it’s possible to present them as state-space-models to get linear scaling with data http://proceedings.mlr.press/v33/solin14.html, but the state-space-models need also efficient implementations and especially they would require gradient computations written also in C++ as just autodiffing through Kalman filter is very slow. I’m not aware of such implementation at this time, but it’s something that would be nice to have.

@avehtari - Thanks for this, its really helpful: previously I hadn’t realised your point around the need for (no. weeks) x (no. BF) variables which I can see would become quite impractical.

With stationarity - my understanding when reading the paper was that our ability to construct the approximation is conditional on being able to use Bochner’s Theorem, for which stationarity is a neccessary condition.

I think I will need to revisit the paper to properly understand this.

Thanks again for your thoughtful comments.

1 Like

Sorry for the confusion. The method for generating basis functions in the paper assumes stationarity, but there are many ways to generate basis functions and I commented that in general it is possible to have basis functions that are non-stationary.

1 Like