Applied Gaussian Processes in Stan, Part 1. A Case Study

anon79882417 · August 31, 2018, 11:59am

Hey all - More specifically, hey GP people -

Writing up some stuff I’ve done over the summer at Aalto (although a bit late…). Aiming for an applied, pedagogical intro to applied Gaussian Processes in Stan, using a variety of open source data sets.

This is Part 1: Intro to GP notation, motivation behind GPs, some properties of kernels, GP regression and ARD.

Some things I could use help with:

Hyperparameter priors: I could at least use a length scale that uses the scale of the data, as in Betancourt’s case study.
Out of sample: Using Loo package instead of a naive RMSE.
I was trying to show the smoothness changes in different cases of Matern kernel, but the sample size to high/vizualizations too small to show the “wiggliness” of the kernels.

Part 2 is coming. It has some different models including GP classificaiton, GP time series (modeling directly and using a GP prior in GARCH), and then a survival model. I split it up because the case study was becoming too long.

applied_gaussian_processes_in_stan.pdf (419.7 KB)

jjramsey · August 31, 2018, 12:51pm

I noticed that you are using some functions, such as add_diag and gp_exp_quad_cov, that do not appear to be defined in the functions block and also do not appear in the documentation for Stan. Are these undocumented features in the latest CmdStan release, or are they from a development version?

anon79882417 · September 1, 2018, 10:42pm

good question.

I think add_daig is in 2.18 math library, but it hasn’t been added to the language yet. There are covariance functions such as the matern32 and matern52 that I’m using locally, and still pushing through PRs to get them merged into the Stan library. Good call on the docs, I need to update the stan docs to show that at least add_diag has been included. Besides these technical issues, do you have any comments about modeling issues or the pedagogy of the doc in general? I know it’s basic, but I’m trying to show users how to build up from nothing!

jjramsey · September 2, 2018, 12:51pm

I am curious about how identifiable the length scales in ARD are. I’ve had issues with that myself in dealing with GPs, and I’m curious as to how much of that is at my end and how much is just inherent to GP regression.

anon79882417 · September 2, 2018, 4:48pm

yeah, that warrants another case study. I could trivially make the real<lower=0>length_scale[D] an ordered vector, but then we would be assuming that the length_scale1 < length_scale2 < length_scale3 < … and I don’t think this is a good assumption. I could run a model a bunch of times w/ different random seeds and see where they tend to fall, and re-order the dataset in that way but it’s a hack… I’ll put some actually though into it, thank you.

increasechief · September 2, 2018, 6:03pm

Really fantastic work @anon79882417 I took in a lot of lessons from your methodology and presentation.

jgcolman · December 11, 2018, 10:53am

That is a good question. I find there is often an interplay between the length scales and the variance, so that it may not make sense to treat the two as independent hyper-parameters.

NTorabi · July 15, 2019, 11:48pm

Hey @anon79882417, thanks for putting all these together. By any chance, have you prepared the part related to classification? I am trying to understand the function gp_pred_rng(…) and I thought asking your help/write-up. Thanks

NTorabi · July 16, 2019, 3:06am

@anon79882417 Yes I am a real person with a real question. Considering that the likelihood is not Guassian in the GP classification, I am trying to understand how did you come up with the posterior predictive for the test data in function gp_pred_rng(…)? I am basically referring to this post that you have for classification.

anon79882417 · July 16, 2019, 8:44am

The integral were evaluation for a non Gaussian likelihood is this:

\int \pi(y|f, X, \theta) \pi(f | X, \theta) \pi(x|\theta)

In Gaussian likelihood case the \pi(f | •) gets marginalized out, but here it does not. Instead you generate latent f from:

Computer covariance function K
Compute cholesky decomp K=LL’
Then use L to sample from MV Gaussian and f is a realization from that MV Gaussian so: f = \rho * L, \rho \sim N(0, 1)
And then use that f in your likelihood which I depends on the model you specify. f could be a prior for a parameter in a linear model, you could use it directly in the logistic or multi nominal likelihood, etc.

A good reference is GPML, and I think most of this is in the Classification chapter except step 4.

The equations for ‘gp_pred_rng’ are in the document, I believe on the first page. Keywords are “posterior predictive”

anon79882417 · July 16, 2019, 8:47am

Yes, apologies. Please let me know if you have any other questions, and feel free to open another thread with code if you have questions on implementation details. There are better references than that document.

Andre

betanalpha · July 16, 2019, 9:02am

See Section 3.3 in https://betanalpha.github.io/assets/case_studies/gp_part1/part1.html#33_simulating_from_a_gaussian_process_conditional_on_non-gaussian_observations for an example of how to implement a non-Gaussian observational model. If you replace the Poisson distributions (for integer-valued observations) with Bernoulli distributions (for binary-valued observations) then you get a Gaussian process model amenable to classification. In other words you end up modeling the logit probability with the Gaussian process.

Max_Mantei · November 21, 2019, 10:57am

Hi! Cool stuff, thanks! Did you ever find the time to write that second part? :)

Topic		Replies	Views
Modeling time dynamics with GP Modeling specification , gaussian-process	12	650	April 29, 2020
GP Docs - Predictive Inference for Gaussian Processes Proposals Developers gaussian-process , docu	6	852	February 5, 2024
diagSPD for Matérn GP kernels other than 3/2 Modeling techniques , specification , gaussian-process	10	167	October 29, 2024
Gaussian process kernals Modeling gaussian-process	9	823	June 21, 2020
Seeking advice on optimizing a Gaussian Process (GP) prior to improve my model's convergence time Modeling	5	415	June 1, 2024

Applied Gaussian Processes in Stan, Part 1. A Case Study

Related topics