# More efficient anisotropic calculation for Gaussian Process Kernel?

Hello! In Chapter 4 of Rasmussen and Williams (pg. 89) it mentions that
“Anisotropic versions of these isotropic covariance functions can be created by setting
`r^2(x, x') = (x − x')^T M (x − x')` for some positive semidefinite `M`.”

Assuming `M` is `DxD`, and `x` are `D-`dimensional vectors, and there are `N` of them, one could compute

``````for (i in 1:N-1)
{
Omega[i,i] = 0.0;
for (j in (i+1):N)
{
Omega[j,i] = Omega[i,j];
}
}
Omega[N,N] = 0.0;
``````

assuming `M` is symmetric positive definite.

My question is, how can this be vectorized? Is there a way of making a full matrix out of the `x` data and doing something more efficient than the double loop? Any suggestions are most welcome!

2 Likes

Okay update: If I know that `M` has Cholesky factor `L` so `L L' = M`. Then would the following be faster:

``````matrix[D,D] L_transpose = L'; // I assume this costs flops so we should do it once and not in the loop?
for (i in 1:N-1)
{
Omega[i,i] = 0.0;
for (j in (i+1):N)
{
Omega[i,j] = dot_self(L_transpose * (x[i]-x[j])');
Omega[j,i] = Omega[i,j];
}
}
Omega[N,N] = 0.0;
``````

That is, does Stan take advantage of the lower triangular structure in the matrix-vector multiply?

1 Like

Hi,
sorry for not gettign to you earlier. I think this math stuff would be better handled by @stevebronder or @syclik, it goes a bit over my head. But I can answer a subquestion:

AFAIK Stan doesn’t do any optimization like this “magically”. There are some specialized functions to handle triangular matrices like `mdivide_left_tri_low` but I don’t think there is anything for multiplication.

However, implementing this yourself should be easy. Stan is compiled to C++ and pays very little penalty for loops (unlike R). Obviously this may not help much, especially if this is not a bottleneck in your model.

Hope that helps at least a little and hope others will be able to provide a bit better advice.

2 Likes

So @martinmodrak you are suggesting something like:

``````for (i in 1:N-1) {
Omega[i,i] = 0.0;
for (j in (i+1):N) {
real val = 0.;
Omega[i,j] = 0.;
for (k in 1:D) {
for (l in k:D) {
val += L[k,l]*(x[i,l]-x[j,l]);
}
Omega[i,j] += square(val);
}
Omega[j,i] = Omega[i,j];
}
}
Omega[N,N] = 0.0;
``````

(if I have my math right). Where `x` is `D` dimensional.

What impact (if any) does this have on the AD stack? Is the AD “smart enough” to know to aggregate the same values of `L[k,l]` for differentiation?

Thanks!

Update: doing some timings (locally) this recent loop method is very slow.

Of the things I tried, in order of speed (from slowest to fastest–or least slow in this case):

`dot_self`, then `quad_form_sym` then `quad_form` version is fastest, but still very, very slow.

In python, I would precompute all of the `x[i]-x[j]` into a matrix, call it `X`.

Then `X` is `NxNxD` and the above computation (in python/numpy) would be:
`Omega = (np.dot(X,M) * X).sum(axis=2).`

So question: is there a way of doing matrix multiplication with a tensor in stan?

I would like to compute `X` in the transformed parameters block as dimension `NxNxD` and do:

``````some_type Omega_pre_sum[N,N,D] = (X * M) .* X
for (i in 1:N) {
for (j in 1:N) {
Omega[i,j] = sum(Omega_pre_sum[i,j,:])
}
}
``````

is this possible?

3 Likes

I admit, this is getting out of my expertise. Asking @stevebronder or @rok_cesnovar if they have time to discuss this.

Bumping this up to hopefully get you more attention. Maybe @bgoodri has an idea?

I also think that partial indexing via `sum(Omega_pre_sum[i,j,])` or `sum(Omega_pre_sum[i,j])` should work out of the box.

Thanks for bumping this! Just an update:

Best so far (in terms of time): (assume `M = L*L'` in `DxD`)

``````for (i in 1:N-1) {
Omega[i,i] = 0.0;
for (j in (i+1):N) {
Omega[i,j] = sum((X[i,j]*M) .* X[i,j]);
Omega[j,i] = Omega[i,j];
}
}
Omega[N,N] = 0.0;
``````

where `X` is `row_vector[D] X[N,N]`

I will say, with `D = 64` and `N = 1000` this is extremely slow. After profiling it, the forward pass is about 1/3 the time and the autodiff takes 2/3.

I would love to “vectorize” this more and take advantage of matrix derivatives if there is a way to do so.

Any ideas are MOST welcome!

Dan

1 Like

Does using a `dot_product` speed up the calculation?

``````...
Omega[i, j] = dot_product(X[i, j] * M, X[i, j]);
...
``````

You can try removing the symmetrize allocation and call outside the loop with `symmetrize_from_lower_tri(Omega)` though I’m not sure you’ll get much from it.

1 Like

That helped a bit. Apologies, but re-looking at the different versions I have tried, the fastest (by a considerable margin) was

``````for (i in 1:N-1) {
Omega[i,i] = 0.0;
for (j in (i+1):N) {
where `X` is now `vector[D] X[N,N]`
I guess I could construct a very large block matrix out of the `X[i,j]` somehow?