Hello! In Chapter 4 of Rasmussen and Williams (pg. 89) it mentions that
“Anisotropic versions of these isotropic covariance functions can be created by setting r^2(x, x') = (x − x')^T M (x − x') for some positive semidefinite M.”

Assuming M is DxD, and x are D-dimensional vectors, and there are N of them, one could compute

for (i in 1:N-1)
{
Omega[i,i] = 0.0;
for (j in (i+1):N)
{
Omega[i,j] = quad_form_sym(M, (x[i]-x[j])');
Omega[j,i] = Omega[i,j];
}
}
Omega[N,N] = 0.0;

assuming M is symmetric positive definite.

My question is, how can this be vectorized? Is there a way of making a full matrix out of the x data and doing something more efficient than the double loop? Any suggestions are most welcome!

Okay update: If I know that M has Cholesky factor L so L L' = M. Then would the following be faster:

matrix[D,D] L_transpose = L'; // I assume this costs flops so we should do it once and not in the loop?
for (i in 1:N-1)
{
Omega[i,i] = 0.0;
for (j in (i+1):N)
{
Omega[i,j] = dot_self(L_transpose * (x[i]-x[j])');
Omega[j,i] = Omega[i,j];
}
}
Omega[N,N] = 0.0;

That is, does Stan take advantage of the lower triangular structure in the matrix-vector multiply?

Hi,
sorry for not gettign to you earlier. I think this math stuff would be better handled by @stevebronder or @syclik, it goes a bit over my head. But I can answer a subquestion:

AFAIK Stan doesn’t do any optimization like this “magically”. There are some specialized functions to handle triangular matrices like mdivide_left_tri_low but I don’t think there is anything for multiplication.

However, implementing this yourself should be easy. Stan is compiled to C++ and pays very little penalty for loops (unlike R). Obviously this may not help much, especially if this is not a bottleneck in your model.

Hope that helps at least a little and hope others will be able to provide a bit better advice.

for (i in 1:N-1) {
Omega[i,i] = 0.0;
for (j in (i+1):N) {
real val = 0.;
Omega[i,j] = 0.;
for (k in 1:D) {
for (l in k:D) {
val += L[k,l]*(x[i,l]-x[j,l]);
}
Omega[i,j] += square(val);
}
Omega[j,i] = Omega[i,j];
}
}
Omega[N,N] = 0.0;

(if I have my math right). Where x is D dimensional.

What impact (if any) does this have on the AD stack? Is the AD “smart enough” to know to aggregate the same values of L[k,l] for differentiation?

You can try removing the symmetrize allocation and call outside the loop with symmetrize_from_lower_tri(Omega) though I’m not sure you’ll get much from it.