Applied Gaussian Processes in Stan, Part 1. A Case Study


Hey all - More specifically, hey GP people -

Writing up some stuff I’ve done over the summer at Aalto (although a bit late…). Aiming for an applied, pedagogical intro to applied Gaussian Processes in Stan, using a variety of open source data sets.

This is Part 1: Intro to GP notation, motivation behind GPs, some properties of kernels, GP regression and ARD.

Some things I could use help with:

  1. Hyperparameter priors: I could at least use a length scale that uses the scale of the data, as in Betancourt’s case study.

  2. Out of sample: Using Loo package instead of a naive RMSE.

  3. I was trying to show the smoothness changes in different cases of Matern kernel, but the sample size to high/vizualizations too small to show the “wiggliness” of the kernels.

Part 2 is coming. It has some different models including GP classificaiton, GP time series (modeling directly and using a GP prior in GARCH), and then a survival model. I split it up because the case study was becoming too long.

applied_gaussian_processes_in_stan.pdf (419.7 KB)

Gaussian Process(es): gp_exp_quad_cov not found
Experiment for Latent Gaussian Models
Gaussian process with hierarchical Gaussian process prior

I noticed that you are using some functions, such as add_diag and gp_exp_quad_cov, that do not appear to be defined in the functions block and also do not appear in the documentation for Stan. Are these undocumented features in the latest CmdStan release, or are they from a development version?


good question.

I think add_daig is in 2.18 math library, but it hasn’t been added to the language yet. There are covariance functions such as the matern32 and matern52 that I’m using locally, and still pushing through PRs to get them merged into the Stan library. Good call on the docs, I need to update the stan docs to show that at least add_diag has been included. Besides these technical issues, do you have any comments about modeling issues or the pedagogy of the doc in general? I know it’s basic, but I’m trying to show users how to build up from nothing!


I am curious about how identifiable the length scales in ARD are. I’ve had issues with that myself in dealing with GPs, and I’m curious as to how much of that is at my end and how much is just inherent to GP regression.


yeah, that warrants another case study. I could trivially make the real<lower=0>length_scale[D] an ordered vector, but then we would be assuming that the length_scale1 < length_scale2 < length_scale3 < … and I don’t think this is a good assumption. I could run a model a bunch of times w/ different random seeds and see where they tend to fall, and re-order the dataset in that way but it’s a hack… I’ll put some actually though into it, thank you.


Really fantastic work @drezap I took in a lot of lessons from your methodology and presentation.


That is a good question. I find there is often an interplay between the length scales and the variance, so that it may not make sense to treat the two as independent hyper-parameters.