My personal opinion about use of GPs in Stan
- Small to moderate size data (where uncertainty quantification is important).
- Non-linear models with implicit interactions (See, e.g. leukemia example in BDA3. These are difficult or infeasible to do with splines etc.).
- Hierarchical non-linear models and non-linear latent functions as modular part of bigger models (these limit which speed-up approximations can be used)
- GAMs with GPs (easier to set priors than for splines)
- Flexibility by allowing user defined covariance functions written as Stan functions (functor approach for building covariance matrix)
- Laplace method for integrating over the latent values (this will make inference much faster, but it’s applicable only for restricted set of models)
Especially
-
We shouldn’t compete with specialized GP software that scale to bigger data (with less worry about uncertainty in covariance function parameters), but have restriction in model structure. That means we are not in a hurry to implement some complicated speed-up approximations which can be used only for very restricted models.
-
For many spatial and spatio-temporal data it’s better to use Markov models with sparse precision matrices as discussed in “Sparse Matrices for Stan” document (type of models commonly used with INLA software)
-
Wishlist for dense matrices from “Sparse Matrices for Stan” document lists some things which are already in progress
- Parallel (GPU and/or MPI) dense linear algebra
- General matrix algebra (sum, product, kronecker, matrix_divide, determinant) with reverse-mode derivatives
- Cholesky and derivative on GPU and general parallel
- See Stan manual Section 42.2 (Matrix arithmetic operations), 42.13 (Linear Algebra Functions and
Solvers) - Maybe also 42.4 (Elementwise Functions), 42.5 (Dot Products and Specialized Products)
-
The internal covariance matrix functions @anon79882417 is working on will make some non-linear models with implicit interactions, hierarchical non-linear models and non-linear latent functions as modular part of bigger models faster.
-
The Laplace method Charles is working in will make inference for combination of GP prior on latent function and log-concave likelihood faster.
-
A basis function representation of GPs Michael and Gabriel are working on will make GAMs with GPs, and 1D GPs as part of bigger models faster.
-
The covariance matrix approach (cov_exp_quad etc) is very limited in flexibility and turns out to use also lot of memory if many covariance matrices are combined- That’s why the wishlist has “Functor for GP specification”. “Sparse Matrices for Stan” lists
- Example would be f ~ GP(function cov_function, tuple params_tuple, vector mean, matrix locations ) for the centred parameterisation.
- For the non-centred parameterisation, you would need the back-transform f = GP_non_centered_transform(vector non_centered_variable, function cov_function, tuple params_tuple, vector mean, matrix locations ).
- A GP_predict() function that takes the appropriate arguments and produces predictions.
- Implemention would populate the matrix, do the Cholesky and compute the appropriate quantities
- See also Question about autodiff for potential GP covfun implementation
Note that my wishlist doesn’t have yet such speedup approximations as inducing point approaches, as they are quite complicated to implement and use.