Adding Gaussian Process Covariance Functions

Per the discussion at the meeting today I wrote down the required function implementations for adding a Gaussian process in the Stan Math Library, https://github.com/stan-dev/stan/wiki/Adding-a-Gaussian-Process-Covariance-Function.

I tried to keep the same naming prefix convention that the cov_exp_quad function uses, but that would lead to inconsistencies with suffix notation the log density functions use so I went with the latter. Hopefully this won’t be too much of a conflict, as we can always keep cov_exp_quad around and deprecate it when we introduce exp_quad_cov, exp_quad_chol and exp_quad_lpdf.

@Lu.Zhang, eventually we’d want your NNGP to be implemented with this pattern so that it will be easy for our users to incorporate into their models. Does it make sense or is there anything on which I should elaborate? Thanks!

Quite often some noise term (in case of Gaussian) or jitter term (in case of non-Gaussian) is added to the diagonal of covariance matrix before computing Cholesky. That is we usually want to compute something related to Chol(K+Z) (even better is Chol(K+Z)\y) or logdet(K+Z), where Z is diagonal matrix. If covariance matrices with special structure (reduced rank or kronecker) is used, then Woodbury-Sherman-Morrison formula is used to handle sum of special structure in K and diagonal matrix Z.To get speed the simplest composite covariance we need is K+Z, where both K and Z have some special structure with less than O(n^3) computation for Cholesky, inevrsion or log determinant, but these special structures can be different (like low rank, kronecker, diagonal, or block diagonal). Currently I don’t see what benefit we would get from having fast Chol(K) if we need Chol(K+Z) (but maybe I forgot something).

Also I still think it would be good idea to allow composite covariance functions even with just scaling, sum and product. This would already cover most what GPstuff can do. But I guess an efficient implemantation in Stan would require really working in terms of covariance functions instead of covariance matrices.

Aki

2 Likes

Forgot to add that I’m thinking generic composite covariance functions with scaling, sum and product mostly for “dense” covariance functions (this is easy), and for special (low rank, sparse, kronecker) only limited compositions allowed (this is hard).

Aki

Hi Micheal,

Thank you so much! Yes, I have a question. Do I have to give the

covariance matrix and Cholesky factor? Since NNGP constructs the precision
matrix directly in the calculation of log-likelihood, I don’t think I will
have a function that returns a dense covariance matrix. Thank you again for
your help!

Best,

Lu Zhang

Then exp_quad should be defined with the +Z explicitly, and users can set sigma to a parameter or some hard-coded small value. Similarly, exp_quad-like covariance functions with special structure would have the same +Z in their definitions so that WSM and the like can be exploited in the implementation. The point is that we do not want to (a) code a huge amount of specialized linear algebra functions and (b) require that users have to ensure that they call them all appropriately. All of that should be abstracted away in the GP definition.

Even a restricted algebra of covariance functions requires adding far too much to the math library and the language to be supported at this point.

Lu, you’ll have to invert the precision matrix, possibly exploiting its sparsity. The idea is that we want all of our covariance functions to be compatible with each other, so for example we could add the NNGP to a exp_quad GP if necessary. The Cholesky factor of the covariance is particularly important as we need it for non-centered parameterizations of GPs which are extremely important for ensuring robust fits in complex models.

To start these don’t have to be super-efficient implementations – you can just assume that the precision is dense and invert it naively for now, and then later consider exploiting its sparsity structure for a more efficient implementation.

Okay, I get it. Thank you so much!

Best,

Lu Zhang

@betanalpha can you clarify the naming convention with an example? Are you envisioning deprecating cov_exp_quad and encouraging exp_quad_cov? It just wasn’t clear enough in the doc for me to answer that question.

Not encouraging, requiring. The suffix notation is consistent with the probability densities which takes priority over the single GP covariance function implementation in the language.

Cool. We’ll need an issue to deprecate cov_exp_quad() and rename it exp_quad_cov(). And make everything else follow that convention.

1 Like

Can we prepend the functions with something like kernel, or Gram? exp_quad seems too vague to me.

Are you worried about alphabetization? What do you recommend as the whole name for the function, gram_exp_quad, exp_quad_gram, or was it exp_quad you objected to?

I talked to Rob about this the other day – he was referring to the fact that there’s nothing in the naming convention that specifies that these are related to Gaussian processes. We considered also requiring a _gp_ tag, as in exp_quad_gp_cov, exp_quad_gp_chol, and exp_quad_gp_lpdf, but then we have seven characters minimum before we even add the specific implementation name.

That seemed a big onerous, but I could be convinced that the clarity is worth it.

I was thinking a bit more about @avehtari’s request for a summation function over GP covariance functions. Again there are various technical issues and it’s not clear how often people are running into memory issues that might motivate an explicit function, but if we do want something then I think the best next step would be something like sum_matrix_chol and sum_matrix_normal_lpdf which take in an array of covariance matrices and return the Cholesky of their sum or the normal density using their summed covariance. These would reduce to burden on the autodiff to almost as much as possible.

1 Like

Mike is right, I was worried about the fact that there wasn’t anything explicit about the functions exp_quad_cov, exp_quad_chol and exp_quad_lpdf being related to GPs.

As we expand the kernel options (Matern 3/2, 5/2, etc.) it’ll be hard for users to find these functions in the doc appendix easily. I’d actually advocate for something like gp_exp_quad_cov, gp_exp_quad_lpdf, etc.

1 Like

If we add an additional identifier I’d prefer that it go with the suffixes so we don’t have an awkward sandwich of naming requirements. Do people think that ```…_gp_lpdf```` is too long or okay for a suffix?

Right. Alphabetization. I wouldn’t worry about that. They’ll want to look in a chapter of the manual where we’ll gather them all together just to see what they are.

I don’t like adding gp to these, because there’s nothing GP-specifica bout them. They’re just functions for constructing covariance matrices. The fact that you can use them in GPs is neither here nor there. By the same token, I wish I hadn’t called the basic types cov_matrix and cholesky_factor_cov—should’ve been spd instead of cov, because they’re just symmetric positive definite independently of their use as covariance matrices.

We could do this syntactically without doing anything other than defining a new function:

cov_matrix[K, K] A;
cov_matrix[K, K] B;
cov_matrix[K, K] C;
...
cov_matrix[K, K] ABC = sum({A, B, C});

which would be a lot more memory efficient than A + B + C the way addition’s currently implemented.

What Michael’s suggesting will be the most efficient, but then it restricts usage to the compound functions for which it’s defined. I don’t know if the lack of generality will be worth the extra efficiency.

Actually they are GP-specific. What defines a GP (to first-order) is exactly the function mapping a set of input covariates to a covariance matrix over them.

Just checking that the logic goes both ways. I think you’re claiming that whatever I do with the output of one of these functions, you’re going to call the model a GP? If that’s right, then adding _gp to the functions makes sense.