GP Docs - Predictive Inference for Gaussian Processes Proposals

Hey -

Users aren’t able to use our documentation easily to implement posterior predictives for GPs, and they’re leaning on other resources. Here are some proposals as to how I’d like to re-write the section “Predictive Inference with a Gaussian Process”. Our user base mostly comes from an applied background and want to see some examples of common models that we can use Gaussian Processes for, and not just Stan code and equations.

  1. First, we’d want to have predictive inference for GPs be it’s own section. (10.4, or wherever)
  2. In the applied Gaussian Process literature, it’s common to see the “Joint Covariance Matrix Introduced”. I think this is good to introduce, as it would be helpful for our users to see similarities to our documentation and the literature. For those not up to speed, that’s the following matrix. Let K_{X^*,X^*} be a covariance matrix computed from unseen data, K_{X,X} be covariance computed from training data, and then K_{X^*, X} be one of the two cross covariance matrices. Then K = [ K_{X, X}, K_{X^*, X}; K_{X, X^*},K_{X^*,X^*}] in MATLAB matrix shorthand.
    We can then review the prediction equations for the posterior predictive for GP’s (i.e. f_\mu = K_{X^*, X}K_{X,X}^{-1}y, for the predictive mean function without noise). We’ll include code examples and how to implement GP posterior predictives with and without noise, a hierarchical prior for noise.
  3. After this, although code will mostly be the same, we’ll use 3 examples with statistical graphics to help elucidate what we’re estimating for the user. They will be:
    a. Simple Regression example (easy to understand, no time component, just new, unseen data).
    b. Time series example (everyone will know how to code the example on the cover of BDA3 in Stan).
    c. A non-gaussian likelihood example, let’s do a simple survival model. This should make implementing GP priors for parameters in arbitrary models more transparent to implement. Real data example, make be a simplified version of the Leukemia example from BDA3 with a different likelihood.

All of these above conditions will obey the following:

  1. No current content of these sections will be omitted, they’ll just be re-worked in a pedagogically concise way.
  2. Notation for this section will be re-written to match the notation in BDA3 as much as possible.
  3. Unify notation with common GP literature such as GPML. User’s should be able to connect out documentation to what they see in applied GP papers.
  4. Posterior predictive section will contain statistical graphics to make the the documentation more memorable and illustrative. Emphasis on clear statistical graphics.

Thoughts?

Andre

9 Likes

This sounds great! If you want to rewrite it following these principles I think that would definitely be a big help to our users.

Cool thanks, do I have a reviewer? For some reason it’s not letting me quote you.

I’m probably of the degree of expertise (low-to-middling) and interest (high) in GPs as the intended audience of this re-write, so I’d be happy to help review (in the edit sense, not the official dev-reviewer sense).

2 Likes

In addition to @mike-lawrence, maybe @avehtari and/or @rtrangucci? Anyway, these updates would be great and I’m sure we can find reviewers whenever you’re ready for someone to take a look (I’m not a GP expert but I can help track down reviewers when the time comes).

1 Like

Create a PR. Everyone’s totally OK with updating the user’s guide and reference manual.

1 Like

Did this new section ever happen? Looking for some reference material on this topic!

Edit: easiest to follow example to grasp this I’ve found is this: An Introduction to Gaussian Process Regression - Dr. Juan Camilo Orduz
Though its python based!

1 Like