Custom distance metric with gaussian process

Matias_Guzman_Naranjo · June 28, 2020, 9:11am

Hi everyone,

I am wondering whether it is possible to use a custom distance metric with a gaussian process in Stan. I have a classification problem, and the idea is that items which are close together should be assigned to the same class. However, the way of calculating the distance between two items is non-trivial and as far as I can see, non of the implemented kernels do what I want. So far, I have tried to pass a distance matrix to Stan, and then use those distances instead of the (x_{i} - y_{j})^2 term in the exponentiated quadratic function.

This is my code so far:

data {
  int<lower=1> N1;  // N train
  int<lower=1> N2; // N test
  int<lower=1> N;   // train + test
  int<lower=0, upper=1>   x1[N1]; // observations
  matrix [N, N] K;   // distances
  real<lower=0> gamma1; //priors
  real<lower=0> gamma2; //priors
}

parameters {
  real<lower=0> rho;
  real<lower=0> alpha;
  real a;
  vector[N] eta;
}

transformed parameters {
  vector[N] f;
  matrix[N, N] L_K;
  matrix[N, N] K2;

  for(i in 1:N)
for(j in 1:N)
  K2[i,j] = square(alpha) * exp(-0.5 * rho * (K[i, j]));

  for (n in 1:N)
K2[n, n] = K2[n, n] + 1e-10;

  L_K = cholesky_decompose(K2);
  {
f = L_K * eta;
  }

}

model {
  rho ~ inv_gamma(gamma1, gamma2);
  alpha ~ normal(0, 1);
  a ~ normal(0, 1);
  eta ~ normal(0, 1);

  x1 ~ bernoulli_logit(a + f[1:N1]);
}

generated quantities {
  int z2[N];
  for (n in 1:N)
z2[n] = bernoulli_logit_rng(a + f[n]);
}

The model at least compiles and samples. But the out of sample predictions are not very good.

Does this set up make sense? Is there a better way to do this?

Thanks!

emiruz · June 28, 2020, 11:18am

Will the posterior parameters by in any way physically meaningful in your model? I.e. will they correspond to some theoretic generative assumption? If not, there are many very good clustering algorithms that work directly off of an arbitrary distance matrix (I like affinity propagation for example).

Matias_Guzman_Naranjo · June 28, 2020, 11:39am

Thank you for your answer emiruz.

Will the posterior parameters by in any way physically meaningful in your model? I.e. will they correspond to some theoretic generative assumption?

I do not know exactly what the generative model should be, other than ‘similar items should belong to the same class’. However, methods like KNN do not work very well on this type of task because they only consider the similarity of item x to a small set of neighbors \{y_i, ..., y_j\}, but class assignment should take into account the similarity of x to all other items. My idea was that a gp could capture this intuition. But maybe not?

I will look into the affinity propagation method you mention.

emiruz · June 28, 2020, 11:50am

I guess I’m saying if what you want is a clustering algorithm and you don’t care for the values of the model parameters (rho,alpha,a) and you just want item classifications, then you’re better off using a state of the art clustering algorithm many of which work directly off of an arbitrary distance matrix . AP is great but there are many others. Have a look at exemplar based clustering in general.

I think this is par for the course.

anon79882417 · June 29, 2020, 7:06pm

Yep. Here’s code for an ARD exponential quadratic from the user’s guide:

functions {
  matrix L_cov_exp_quad_ARD(vector[] x,
			    real alpha,
			    vector rho,
			    real delta) {
    int N = size(x);
    matrix[N, N] K;
    real sq_alpha = square(alpha);
    for (i in 1:(N-1)) {
      K[i, i] = sq_alpha + delta;
      for (j in (i + 1):N) {
	K[i, j] = sq_alpha
	  * exp(-0.5 * dot_self((x[i] - x[j]) ./ rho));
	K[j, i] = K[i, j];
      }
    }
    K[N, N] = sq_alpha + delta;
    return cholesky_decompose(K);
  }
}

This is using squared l2-norm (the squared euclidean distance). You’d want to replace this: dot_self(x[i] - x[j]) with whatever distance metric you want to use.

This won’t work, distance is defined implicitly in C++ for each covariance function. Built in ones are implemented as in GPML Chapter 4.

I might verify (prove or code) that the kernel you’ve implemented is real symmetric (K = K^T for K\in \mathbf{R}^{n \times n} ) or positive semi-definite (eigenvalues are not less than 0.)

Matias_Guzman_Naranjo · June 29, 2020, 7:17pm

Thanks for your answer! I am using a variation on Levenshtein’s distance, so I’m not sure this is implementable in Stan, or that it would sample in reasonable time.

This won’t work, distance is defined implicitly in C++ for each covariance function.

I don’t quite understand why I can’t pass the x[i] - x[j] precomputed as data to the model? This is the bit I compute outside.

anon79882417 · June 30, 2020, 4:20am

The built in covariance functions compute the distance implicitly. It doesn’t really make sense to take a pre-computed distance, and then input it into a function that then takes the distance again, right?

You can do whatever you’d like, you just have to be careful to implement the covariance function correctly in the functions block.

Matias_Guzman_Naranjo · June 30, 2020, 7:31am

Right. I guess a different way of asking the question is, if in your code above I replace

sq_alpha * exp(-0.5 * dot_self((x[i] - x[j]) ./ rho));

with

K[i,j] = sq_alpha * exp(-0.5 * D[i, j] ./ rho);

where D is a matrix of precomputed distances, would the result be a reasonable thing? is this nonsense?

avehtari · July 2, 2020, 9:30am

First it’s important to realize that not all combinations of a metric and a covariance function produce valid positive definite covariance matrix. I’m not familiar with Levenshtein’s distance so I don’t if that is valid with cov_exp_quad.

Matias_Guzman_Naranjo · July 2, 2020, 9:40am

Thank you Aki.

The result is a definite covariance matrix, and the model does sample. My worry was that there would be something fundamentally wrong about passing precomputed distances, but I guess not?

avehtari · July 2, 2020, 12:45pm

You can use pre-computed distance matrix (it’s just not supported by cov_exp_quad)

Topic		Replies	Views
Gaussian process and QR reparametrization Modeling specification , performance , gaussian-process	6	803	September 8, 2021
Making arbitrary C++ functions available to Stan Modeling	14	2015	June 12, 2018
Fitting a gaussian process Modeling	5	779	November 6, 2020
How to consider a process in the model without it being a parameter or data? (Spatio-temporal process)) Modeling rstan , hierarchical-model	3	259	March 12, 2024
Gaussian Process Prediction Modeling rstan , techniques , fitting-issues	2	487	March 22, 2021

Custom distance metric with gaussian process

Related topics