ARD implementation Gaussian Process

char · February 28, 2019, 5:10am

Hello! for
Does anyone know how to implement ARD (automatic relevance determination) for prediction e.g.for 2 input vectors x1,x2?
I checked an old implementation but there was only regarding one input vector x…
Thanks

ScottAlder · February 28, 2019, 6:27am

Take a look at section 10.3 in the Stan User’s Guide . ARD is about 1/3 into this section.

char · February 28, 2019, 6:29am

yeah i did!
I am afraid its only for a single input vector x the implementation…

ScottAlder · February 28, 2019, 6:43am

In the ARD section I’m seeing

vector[D] x[N];

Which is a D dimensional vector with each element being an array of size N. This means you have D inputs, and each input has N observations. I also see

vector<lower=0>[D] rho;

which is a vector of D length-scale parameters, one for each input.

char · February 28, 2019, 7:02am

matrix L_cov_exp_quad_ARD(vector[] x,
real alpha,
vector rho,
real delta) {
int N = size(x);
matrix[N, N] K;
real sq_alpha = square(alpha);
for (i in 1:(N-1)) {
K[i, i] = sq_alpha + delta;
for (j in (i + 1):N) {
K[i, j] = sq_alpha
* exp(-0.5 * dot_self((x[i] - x[j]) ./ rho));
K[j, i] = K[i, j];
}
}
K[N, N] = sq_alpha + delta;
return cholesky_decompose(K);
}

Here if i have to input vectors instead of one for example x and y i have to put another for loop inside for the parameter rho? I am not sure how to implement it…
Any help appreciated!!

ScottAlder · February 28, 2019, 9:03pm

Notice:

K[i, j] = sq_alpha * exp(-0.5 * dot_self((x[i] - x[j]) ./ rho));

(x[i] - x[j]) ./ rho is the difference between inputs x[i] and x[j], scaled by rho. Notice the ./ operator, which is elementwise division. This means each input distance is being scaled by its own length-scale rho, with there being D inputs and D length-scales in total. Although TBH I’m a little confused by the indices, wondering if maybe D and N should be switched.

If you want to write out the covariance function for D dimensions without using matrices, its:

k_{SE}(x_i, x_{j \neq i}, ;\alpha, \rho_1, \ldots, \rho_D) = \alpha^2 \exp\left(-0.5\sum_{d=1}^{D} \frac {(x_i^d-x_j^d)^2} {\rho_d^2} \right)

so for a 2D GP, I think the correct Stan implementation is this:

functions {
  matrix L_cov_exp_quad_ARD(vector x1, 
                            vector x2,
                            real alpha,
                            real rho1,
                            real rho2,
                            real delta) {
    int N = size(x);
    matrix[N, N] K;
    real sq_alpha = square(alpha);
    for (i in 1:(N-1)) {
      
      // Diagonal
      K[i, i] = sq_alpha + delta;
      
      for (j in (i + 1):N) {
        
        // Upper triangle
        K[i, j] = sq_alpha * 
          exp(-0.5*(x1[i] - x1[j])^2 / rho1 - 0.5*(x2[i] - x2[j])^2 / rho2));
        
        // Lower triangle
        K[j, i] = K[i, j];
      }
    }
    // Bottom-right corner
    K[N, N] = sq_alpha + delta;
    
    return cholesky_decompose(K);
  }
}
data {
  int<lower=1> N;
  vector[N] x1;
  vector[N] x2;
  vector[N] y;
}
transformed data {
  real delta = 1e-9;
}
parameters {
  real<lower=0> rho1;
  real<lower=0> rho2;
  real<lower=0> alpha;
  real<lower=0> sigma;
  vector[N] eta;
}
model {
  vector[N] f;
  {
    matrix[N, N] L_K = L_cov_exp_quad_ARD(x1, x2, alpha, rho1, rho2, delta);
    f = L_K * eta;
  }

  rho1 ~ inv_gamma(5, 5);
  rho2 ~ inv_gamma(5, 5);
  alpha ~ std_normal();
  sigma ~ std_normal();
  eta ~ std_normal();

  y ~ normal(f, sigma);
}

Kon7 · February 28, 2019, 9:12pm

Thanks for the reply.
So this is for real (as a type ) lengthscale right?
And maybe if i have a ``` vector<lower=0>[D] rho```` i must use the same you did but with the ./ operator?

ScottAlder · February 28, 2019, 9:17pm

Correct. I simplified the 2D model by just having a scalar real parameter for each length-scale. This also requires declaring separate data vectors for each x. However, I think the model that uses matrix/vector operations may be more efficient but I’m not sure.

Written out with separate data vectors for each input x, I don’t think you can use the ./ operator. You would need to do something like:

K[i, j] = sq_alpha * 
          exp(-0.5*(x1[i] - x1[j])^2 / rho[1] - 0.5*(x2[i] - x2[j])^2 / rho[2]));

char · February 28, 2019, 9:24pm

There must be a more efficient way. or an inside function in stan…i mean if D=10 this would be really slow

ScottAlder · February 28, 2019, 9:27pm

Exactly. Which is why the manual implements a D-dimensional GP covariance function like this:

K[i, j] = sq_alpha * exp(-0.5 * dot_self((x[i] - x[j]) ./ rho));

with inputs and length-scales declared as:

vector[D] x[N];
...
vector<lower=0>[D] rho;

Edit: Actually, I just remembered Stan does have some built in covariance functions. I’m not sure if using these instead of the manually coded version would be faster, or why the User’s Guide doesn’t use these.

anon79882417 · February 28, 2019, 11:46pm

There will be functions for this in the next release (meaning you don’t have to implement user defined functions for the seperate length scale case).

anon79882417 · March 1, 2019, 1:24am

Much appreciated @ScottAlder, hope to see you around more!

char · March 1, 2019, 6:52am

because in the cov_exo_quad the lengthscale is interpret only as a real number :(

avehtari · March 1, 2019, 7:35pm

Builtin covariance functions are much faster due to more efficient gradient computations. Thanks to hard work by @anon79882417 we’ll soon also have the support for separate length scales, plus matern and periodic covariance functions in built in functions. We should then add new examples to the user guide.

mathDR · October 19, 2020, 6:11pm

Were these covariance functions ever merged into stan? I don’t see any reference to them in the current functions documentation

Topic		Replies	Views
Kernel parameter estimation in multivariate Gaussian Process Regression Modeling gaussian-process	4	921	April 27, 2020
Gaussian process predictive inference memory pressure Modeling	2	163	February 19, 2024
Implementing an autoregressive distributed lag (ADR) model Modeling techniques , specification , performance	2	857	November 1, 2019
Vectorized algebraic differential equations Modeling	4	665	November 30, 2021
I wanna input a two dimension vector in stan 2.21 Modeling	6	615	November 2, 2021

ARD implementation Gaussian Process

Related topics