Adding Gaussian Process Covariance Functions

This gets a little philosophical – in some sense the question is whether or not there are other processes (probability distributions over Hilbert spaces) with covariance functions K(x1, x2) that marginalize to corresponding probability distributions with the covariance matrices K_{ij} = (x_i, x_j). I believe that only Gaussian processes have this property in which case mathematically we would indeed only ever use these functions for GPs. Even if there are some exceptions, however, I can’t think of a practical use of them otherwise.

Do other people have thoughts?

My understanding is that this is true of any elliptical distribution - it is discussed a bit in Shah et al. 2014, with a reference to Fang et al., to which I don’t have easy access.

My apologies if this is dragging up a dead thread - I saw it on my feed as modified a few days ago even though the last post seems to be from 2/10…

Hi all -

I think this is a good place to post this. Had a good conversation with @syclik today. I’m going to work on developing these covariance functions.

First ARD prior, in the primitives. I’ll first add support for real length_scale[N] and vector[N] length scale arguments, then update rev's cov_exp_quad.hpp, for the length scale arguments for better performance/speed.

If you all have ideas about what I should tackle, suggestions are appreciated!

3 Likes

Hi all -
From my understanding, all of the optimized functions (in the autodiff paper) are written in the rev library, and these are not accessible via prim. If this is the case, is it OK if I add support for just std::vector<double> for the length scale in prim and then go back and write the more “optimal” version in rev, using the functions in the autodiff autodiff paper?

If this is the case, I can submit the prim version and relevant test cases for cov_exp_quad for the ARD priors. My github name is also drezap, do I need permission to push?

Feel free to just send me a link/paper/etc.

We always have some accessible version in prim. The optimized versions are often written once in prim and using template metaprogramming, are written so it’s efficient for rev as well.

You should be able to create a pull request without having permissions to push on the repo.

1 Like

The main paper is our arXiv paper on reverse mode autodiff: https://arxiv.org/abs/1509.07164

Some functions are written once in prim with templates and others are specialized in rev and in fwd. Most of the optimization attention’s been put into rev as it’s the only version we use in Stan interfaces.

Yes, you can specialize just some of the arguments. You just need to make sure that everything else can compile, so sometimes it requires a bit of enable_if or disable_if traits metaprogramming to make sure there aren’t two definitions of the same instantiation.

1 Like

Is there any kernels anyone would like to see next? I see some discussion about matern kernel, a student t, we could also do a radial basis, rational quadratic… I’m happy to take suggestions.

Have we reached a consensus on the naming convention for this library? Hard to tell what’s already been implemented, and what’s not, unless I haven’t found the right resource. I can go ahead and fix the naming…

Great. I think @avehtari is the one to ask, but @syclik and @betanalpha might also have opinions.

@nigul has implemented periodic one https://github.com/stan-dev/math/pull/677
It would be good to check that what you and he have been doing has similar structure.

For some of my experiments I would like to have dot-product (e.g. p. 80 in GPML book). It corresponds to integrated linear model with Gaussian prior on coefficients, but sometimes it’s handy to include that part in the GP covariance instead of modeling the linear model part explicitly.

Matern would be good choice in general. It would probably be enough to have Matern with nu=1/2 (same as exponential), nu=3/2, and nu=5/2, as then we don’t need to compute modified Bessel function (see e.g. p. 84-85 in GPML book). Of course there could be cases where nu could be continuous and unknown parameter, but I don’t think that kind of applications where nu would be well specified and GP would be fast would be common.

The set of exp_quad, periodic, dot-product and Matern would be that good, that I would then start thinking how to speed-up GPs (fast basis function approximations for low dimensional cases D<=2, inducing point approximations for moderate dimensional cases D>=2, approximative integration over latent values).

Hi -
I will be submitting a dot product covariance function in prim later today, so I want to make all naming consistent before I do this.

  1. Naming convention -
    I’m going to update the naming convention (prior to adding more cov functions) as specified here: https://github.com/stan-dev/stan/wiki/Adding-a-Gaussian-Process-Covariance-Function, with the addition that each gp covariance function (and ncp, and lpdf, and rng) starts with gp. I like what @rtrangucci said, as a new user, I had no idea what cov_exp_quad was until I started working on the function. If it has the prefix gp it will be clear that it is part of the gp library.

  2. Argument for the “order” of naming convention.
    I will start with gp so that it is clear that it is a member of the gp library, then the name of the covariance function, and then the suffix should be one of cov, ncp, lpdf or rng. This is very intuitive for a new user. For example, from the name gp_exp_quad_cov we know that exp_quad is a member of the gp and that is it the ‘cov’ member of gp's exp_quad. I.E. gp is a super-set of exp_quad is a super-set of cov. This is also the logic in Mandarin, which I believe is very easy to grasp for new learners. When speaking about a location or date, the ‘larger’ set always come first. Today is 2018/2/7. Very little effort is required to understand this convention.

    Also, for the user to see what kernels they have available to them, in linux, one can just type: $ ls gp_* to get a list of all covariance functions and members. Right now, I can’t easily tell if the lpdf, etc are implemented for exp_quad. The benefits of this convention are numerous.

  3. Deprecation -
    With that said, I’m going to deprecate cov_exp_quad and change it to gp_exp_quad_cov, and later introduce the dot product kernel, which I will name gp_dot_prod_cov.

This will happen today unless there are objections.

1 Like

Doesn’t the spec we developed in (1) already include the gp prefix? I think we had all agreed on that naming convention already so go for it.

A PR to add proper names for the existing cov_exp_quad would be a great start. Just make sure to leave the existing functions as wrappers to the updated functions so that we don’t break backwards compatibility.

1 Like

This’ll probably need an issue in stan-dev/stan to deprecate the old versions at the same time the new ones are added.

1 Like

Hi -

Adding gp_dot_prod_cov.hpp in prim, PR is open right now. I need to re-write the rev stuff, so I’m just adding prim for now.

Posting this here so I hold myself accountable: by May 10th 2018: there will be a support (at bare minimum in prim) for Matern nu=1/2, Matern nu = 3/2 and Matern nu = 5/2. After that I’ll clean up the deprecation. Apologies for the open PRs.

1 Like

We’re not going to hold you accountable. Would you mind closing the PR until it’s ready?

One of my early lessons in software project time estimation was to distrust estimates based on round calendar times (e.g., four weeks from today). I’ve found the best way to hit deliverable deadlines is to break a project down hierarchically into chunks of no longer than one or two days and estimate from there.

3 Likes

Looks like there is more than one parameterization of the Matern. In GPML, they leave out a gamma (a constant multiplier to distance) and a sigma (standard deviation).

In other references (wikipedia, scikit-learn), the sigma is included, and they add an additional constant multiplier to d(x, x’).

I think having more flexibility is good, unless there are objections, I’m including both of these additional parameters with 1.0 as as default.

Also what about ARD prior for the length scale in the matern (vector argument for length scale)?

Are defaults for parameters OK for Stan/math?

Yes, but we don’t have much use for them.

Given that there aren’t defaults in the Stan language, we need to write separate functions for different uses with fewer arguments if we’re going to plug them into the Stan language (which doesn’t have defaults).

1 Like

@Bob_Carpenter thanks for the response. Please see the updated PR.

I guess defaults would have no affect in sampling.

I kept in sigma and removed the gamma parameter, as this is consistent with the squared exponential kernel (cov_exp_quad.hpp). I’ll open another PR with the gamma parameter, because it’s already implemented.

I hope this function:

gets added as well.

@andre.pfeuffer if you promise me a case study I’ll throw it in for you!

different function name for different # parameters or overload it?