diagSPD for Matérn GP kernels other than 3/2

Hi everyone,

I’ve been fitting some Gaussian Process models using the Hilbert space approximation, adapting code from the tutorials by @avehtari here and here. I would like to fit a model which uses a Matérn 1/2 kernel, but I am not sure how to adapt the diagSPD_Matern32 function to the other Matérn kernels - any assistance with this would be greatly appreciated.

Here is the function in question:

vector diagSPD_Matern32(real alpha, real rho, real L, int M) {
    return 2*alpha * (sqrt(3)/rho)^1.5 * inv((sqrt(3)/rho)^2 + ((pi()/2/L) * linspaced_vector(M, 1, M))^2);
   }

Thanks!

See discussion here: Practical Hilbert space approximate Bayesian Gaussian processes for probabilistic programming - #5 by js592

2 Likes

FYI this was recently implemented in brms (you may need to install from github if it hasn’t made it into a release yet). You could use a formula like

y ~ gp(x, cov="exponential", k=10)

where k is the number of basis functions. Depending on your needs you could either use the brms model directly or use make_stancode() and copy the relevant functions to your own Stan model.

2 Likes

Here’s what we think it is (any review / error spotting welcome, see related PR):

2 Likes

That’s great, thanks!

Do you have a link to the maths vingette?

My implementation looks like:

vector diagSPD_exp(real alpha, real rho, real L, int m) {
    vector[m] lambda = (linspaced_vector(m, 1, m)*pi()/(2*L))^2;
    vector[m] diag = 2*alpha*rho/(1+rho^2*sqrt(lambda)^2);
    diag = diag.^(0.5);
    
    return diag;
}

testing both with alpha = 1, rho = 0.5, L = 1, m = 8 I got:

[0.847367,0.748398,0.677581,0.623686,0.580895,0.545854,0.516474,0.491379] (yours)
vs
[0.786439,0.537029,0.390683,0.303314,0.246773,0.207584,0.178955,0.157177] (mine)

I’m actively using this code in my work so I want to make sure I didn’t make any mistakes in my derivation :)

Edit: I have narrowed down the difference to how the \lambda^2 term is coded. E.g. modifying:

vector diagSPD_Matern12(real alpha, real rho, real L, int M) {
  vector[M] indices = linspaced_vector(M, 1, M);
  real factor = 2;
  vector[M] denom = rho * ((1 / rho)^2 + (pi() / 2 / L)^2 * indices^2);
  return alpha * sqrt(factor * inv(denom));
}

gives identical
[0.786439,0.537029,0.390683,0.303314,0.246773,0.207584,0.178955,0.157177]
(or vice vesa if you change sqrt(lambda)^2 to sqrt(lambda) in the other function)

That being said, I need to dig a bit deeper to see which is correct.

2 Likes

So, there are a couple of things going on here. The first component in the approximation is the spectral density, which I derived as:
s(\omega) =\frac{2 \rho \alpha}{1+\rho^2 \omega^2} = \frac{2\alpha}{ \frac{1}{\rho}+\rho\omega^2} =\frac{2\alpha}{ \rho((\frac{1}{\rho})^2+\omega^2)}.

The second component is this spectral density evaluated at the square root of the eigenvalues:
s(\sqrt{\lambda}_m) where \lambda_m = (\frac{m \pi}{2L})^2
So, I think the following implementation is correct:
s(\sqrt{\lambda}_m) = \frac{2\alpha}{ \rho((\frac{1}{\rho})^2+(\frac{m \pi}{2L})^2)}

Then, this gets an additional square root when we are using the non-centered linear representation.

3 Likes

Yes, I think you’re right (and your maths matches our derivation in Gaussian Process implementation details • EpiNow2) - thanks!

3 Likes

Happy to help.

I tested the Matern 5/2 kernel some program and it got more fitting difficulties than the Matern 3/2 kernel (longer runtime). The results however are plausible.

Further I suggest to write the code slightly more homogeneous:

vector diagSPD_Matern12(real alpha, real rho, real L, int M) {
...
  return alpha * sqrt(factor * inv(denom));
}

vector diagSPD_Matern32(real alpha, real rho, real L, int M) {
..
  return factor * inv(denom);
}
vector diagSPD_Matern52(real alpha, real rho, real L, int M) {
...
  return alpha * sqrt(factor * inv(denom));
}

I just got confused because we have two times sqrt(inv(denom)) and one time inv(denom).
Nevertheless I wanted to check the math, but couldn’t find the reference. Do you have a link?
Thank you for the work, btw!

This one? [2004.11408] Practical Hilbert space approximate Bayesian Gaussian processes for probabilistic programming

Hi, for the life of me I cannot get the same result as you did for the 𝜈=5/2 case. I get the same overall shape but a factor 16/3 instead of the 3/2 you have:

S = \alpha^2 \frac{2 \sqrt{\pi} \Gamma(\nu + 1/2)(2\nu)^{\nu}}{\Gamma(\nu)\rho^{2\nu}}(...)^{-3} \\ = \alpha^2 \frac{2 \sqrt{\pi} \Gamma(3)5^{5/2}}{\Gamma(5/2)\rho^{5}}(...)^{-3} \\ = \alpha^2 \frac{2 \sqrt{\pi}\cdot 2 \cdot 5^{5/2}}{\frac{3\sqrt{\pi}}{4}\cdot \rho^{5}}(...)^{-3} \\ = \alpha^2 \frac{2 \cdot 2 \cdot 5^{5/2}}{\frac{3}{4}\cdot \rho^{5}}(...)^{-3} \\ = \alpha^2 \frac{16}{3} (\frac{\sqrt{5}}{\rho})^5(...)^{-3}

What do you think, which one is correct?

Thanks!

1 Like

Your calculation looks right to me, and is in fact what we had originally.

3 Likes
/**
 * Sqrt of spectral density for Exponentiated Quadratic (EQ) kernel.
 *
 * @param alpha Marginal standard deviation.
 * @param rho   Length-scale parameter.
 * @param L     Length of the interval [-L, L].
 * @param M     Number of basis functions.
 * @return A vector of weights for the basis functions.
 */
vector sqrt_diagSPD_EQ(real alpha, real rho, real L, int M) {
  vector[M] indices = linspaced_vector(M, 1, M);
  real factor = alpha * sqrt(sqrt(2 * pi()) * rho);
  real exponent = -0.25 * (rho * pi() / (2 * L))^2;
  return factor * exp(exponent * square(indices));
}

/**
 * Sqrt of spectral density for 1/2 Matern (Ornstein-Uhlenbeck) kernel.
 *
 * @param alpha Marginal standard deviation.
 * @param rho   Length-scale parameter.
 * @param L     Length of the interval [-L, L].
 * @param M     Number of basis functions.
 * @return A vector of weights for the basis functions.
 */
vector sqrt_diagSPD_Matern12(real alpha, real rho, real L, int M) {
  vector[M] indices = linspaced_vector(M, 1, M);
  real inv_rho = 1.0 / rho;
  vector[M] omega_sq = square(indices * pi() / (2.0 * L));
  // S(ω) = α² * (2/ρ) / ( (1/ρ)² + ω² )
  vector[M] S = (2.0 * inv_rho) ./ (inv_rho^2 + omega_sq);
  return alpha * sqrt(S);
}

/**
 * Sqrt of spectral density for 3/2 Matern kernel.
 *
 * @param alpha Marginal standard deviation.
 * @param rho   Length-scale parameter.
 * @param L     Length of the interval [-L, L].
 * @param M     Number of basis functions.
 * @return A vector of weights for the basis functions.
 */
vector sqrt_diagSPD_Matern32(real alpha, real rho, real L, int M) {
  vector[M] indices = linspaced_vector(M, 1, M);
  real k = sqrt(3) / rho;
  // sqrt(S(√λ)) = α_sd * 2k^1.5 / (k² + λ)
  real factor = 2 * alpha * pow(k, 1.5);
  vector[M] denom = k^2 + square(indices * pi() / (2 * L));
  return factor ./ denom;
}

/**
 * Sqrt of spectral density for 5/2 Matern kernel.
 *
 * @param alpha Marginal standard deviation.
 * @param rho   Length-scale parameter.
 * @param L     Length of the interval [-L, L].
 * @param M     Number of basis functions.
 * @return A vector of weights for the basis functions.
 */
vector sqrt_diagSPD_Matern52(real alpha, real rho, real L, int M) {
  vector[M] indices = linspaced_vector(M, 1, M);
  real k = sqrt(5) / rho;
  // S(ω) = α² * (16/(3*5^(5/2))) * k^5 / (k² + ω²)^3
  real numerator_S = (16.0 / (3.0 * pow(5.0, 2.5))) * pow(k, 5);
  vector[M] omega_sq = square(indices * pi() / (2.0 * L));
  vector[M] S = numerator_S ./ pow(k^2 + omega_sq, 3);
  return alpha * sqrt(S);
}

Can you verify the implementation?

For the 5/2 case (didn’t look at the others), I think the pow(5.0, 2.5) shouldn’t be there as it’s already in the term k. Removing it, I get the same numerical results as with my implementation, exposing the functions in R.

2 Likes
/**
 * Sqrt of spectral density for 5/2 Matern kernel.
 *
 * @param alpha Marginal standard deviation.
 * @param rho   Length-scale parameter.
 * @param L     Length of the interval [-L, L].
 * @param M     Number of basis functions.
 * @return A vector of weights for the basis functions.
 */
vector sqrt_diagSPD_Matern52(real alpha, real rho, real L, int M) {
  vector[M] indices = linspaced_vector(M, 1, M);
  real k = sqrt(5) / rho;
  // S(ω) = α² * (16/(3*5^(5/2))) * k^5 / (k² + ω²)^3
  real numerator_S = (16.0 / 3.0) * pow(k, 5);
  vector[M] omega_sq = square(indices * pi() / (2.0 * L));
  vector[M] S = numerator_S ./ pow(k^2 + omega_sq, 3);
  return alpha * sqrt(S);
}

Makes sense. You are right. Those functions are tricky.

3 Likes