Hello! for

Does anyone know how to implement ARD (automatic relevance determination) for prediction e.g.for 2 input vectors x1,x2?

I checked an old implementation but there was only regarding one input vector x…

Thanks

# ARD implementation Gaussian Process

**char**#1

**ScottAlder**#2

Take a look at section 10.3 in the Stan User’s Guide . ARD is about 1/3 into this section.

**ScottAlder**#4

In the ARD section I’m seeing

```
vector[D] x[N];
```

Which is a D dimensional vector with each element being an array of size N. This means you have D inputs, and each input has N observations. I also see

```
vector<lower=0>[D] rho;
```

which is a vector of D length-scale parameters, one for each input.

**char**#5

```
matrix L_cov_exp_quad_ARD(vector[] x,
real alpha,
vector rho,
real delta) {
int N = size(x);
matrix[N, N] K;
real sq_alpha = square(alpha);
for (i in 1:(N-1)) {
K[i, i] = sq_alpha + delta;
for (j in (i + 1):N) {
K[i, j] = sq_alpha
* exp(-0.5 * dot_self((x[i] - x[j]) ./ rho));
K[j, i] = K[i, j];
}
}
K[N, N] = sq_alpha + delta;
return cholesky_decompose(K);
}
```

Here if i have to input vectors instead of one for example x and y i have to put another for loop inside for the parameter rho? I am not sure how to implement it…

Any help appreciated!!

**ScottAlder**#6

Notice:

```
K[i, j] = sq_alpha * exp(-0.5 * dot_self((x[i] - x[j]) ./ rho));
```

`(x[i] - x[j]) ./ rho`

is the difference between inputs `x[i]`

and `x[j]`

, scaled by `rho`

. Notice the `./`

operator, which is elementwise division. This means each input distance is being scaled by its own length-scale `rho`

, with there being D inputs and D length-scales in total. Although TBH I’m a little confused by the indices, wondering if maybe `D`

and `N`

should be switched.

If you want to write out the covariance function for D dimensions without using matrices, its:

k_{SE}(x_i, x_{j \neq i}, ;\alpha, \rho_1, \ldots, \rho_D) = \alpha^2 \exp\left(-0.5\sum_{d=1}^{D} \frac {(x_i^d-x_j^d)^2} {\rho_d^2} \right)

so for a 2D GP, I think the correct Stan implementation is this:

```
functions {
matrix L_cov_exp_quad_ARD(vector x1,
vector x2,
real alpha,
real rho1,
real rho2,
real delta) {
int N = size(x);
matrix[N, N] K;
real sq_alpha = square(alpha);
for (i in 1:(N-1)) {
// Diagonal
K[i, i] = sq_alpha + delta;
for (j in (i + 1):N) {
// Upper triangle
K[i, j] = sq_alpha *
exp(-0.5*(x1[i] - x1[j])^2 / rho1 - 0.5*(x2[i] - x2[j])^2 / rho2));
// Lower triangle
K[j, i] = K[i, j];
}
}
// Bottom-right corner
K[N, N] = sq_alpha + delta;
return cholesky_decompose(K);
}
}
data {
int<lower=1> N;
vector[N] x1;
vector[N] x2;
vector[N] y;
}
transformed data {
real delta = 1e-9;
}
parameters {
real<lower=0> rho1;
real<lower=0> rho2;
real<lower=0> alpha;
real<lower=0> sigma;
vector[N] eta;
}
model {
vector[N] f;
{
matrix[N, N] L_K = L_cov_exp_quad_ARD(x1, x2, alpha, rho1, rho2, delta);
f = L_K * eta;
}
rho1 ~ inv_gamma(5, 5);
rho2 ~ inv_gamma(5, 5);
alpha ~ std_normal();
sigma ~ std_normal();
eta ~ std_normal();
y ~ normal(f, sigma);
}
```

**Kon7**#7

Thanks for the reply.

So this is for real (as a type ) lengthscale right?

And maybe if i have a ``` vector<lower=0>[D] rho```` i must use the same you did but with the ./ operator?

**ScottAlder**#8

Correct. I simplified the 2D model by just having a scalar `real`

parameter for each length-scale. This also requires declaring separate data vectors for each x. However, I think the model that uses matrix/vector operations may be more efficient but I’m not sure.

Written out with separate data vectors for each input x, I don’t think you can use the `./`

operator. You would need to do something like:

```
K[i, j] = sq_alpha *
exp(-0.5*(x1[i] - x1[j])^2 / rho[1] - 0.5*(x2[i] - x2[j])^2 / rho[2]));
```

**char**#9

There must be a more efficient way. or an inside function in stan…i mean if D=10 this would be really slow

**ScottAlder**#10

Exactly. Which is why the manual implements a D-dimensional GP covariance function like this:

```
K[i, j] = sq_alpha * exp(-0.5 * dot_self((x[i] - x[j]) ./ rho));
```

with inputs and length-scales declared as:

```
vector[D] x[N];
...
vector<lower=0>[D] rho;
```

Edit: Actually, I just remembered Stan does have some built in covariance functions. I’m not sure if using these instead of the manually coded version would be faster, or why the User’s Guide doesn’t use these.

**drezap**#11

There will be functions for this in the next release (meaning you don’t have to implement user defined functions for the seperate length scale case).

**avehtari**#14

Builtin covariance functions are much faster due to more efficient gradient computations. Thanks to hard work by @drezap we’ll soon also have the support for separate length scales, plus matern and periodic covariance functions in built in functions. We should then add new examples to the user guide.