Gaussian process with hierarchical Gaussian process prior

anon79882417 · October 2, 2018, 2:32am

I think multi_normal_cholesky will give you a speed increase. Usually low NeFF is a result of poor specification, though.

Let’s think about HMC for a second. We’re first using gradients to go to an area of high posterior mass WRT the parameters in your model. We could just be searching some tail of the distribution, that, conditioned on your model, is not able to generate samples well, because, well, there’s nothing there.

Regarding GPs - we generate a covariance matrix with respect to parameters, and we approximate distributions for those parameters. Then, we can generate predictive distributions for the for the kernel we have generated. Because you have a covariance matrix, doesn’t always imply gaussian process. You can really look at covariance between anything and define any covariance function, which could be meaningless.

Here, your covariance matrix is fixed: matrix[N,N] A; For a gaussian process, one might define a covariance function like so. This is called the Squared Exponential kernel by the Stan people, and the Radial Basis Function (RBF) by almost everyone else:

k(x, x') = \sigma^2 exp(\frac{d(x, x')}{2l^2})

So at each iteration your covariance matrix is generated WRT parameters, the magnitude sigma^2 and the length scale l. The magnitude controls deviation vertically (for a 1D) GP, and the length scale controls how much each parameter is related “along a certain axis” meaning, x1 and x2. This implementation isn’t really correct I can see in any way.

Another note, prior distributions on the hyperparameters for GPs is still an open problem. Are you sure this is a good idea? I mean, do whatever you want, but your computational issues could be indicating something. I think Andrew has some saying for that somewhere.

Also, deep GPs are “a thing”. Composite functions, essentially.

With that said, here’s a proper implementation of a gaussian process regression, which can be found in betancourt’s case studies (recommended), among other places.


functions {
  vector gp_pred_rng(vector[] x_pred,
                     vector y1, vector[] x,
                     real magnitude, real length_scale,
                     real sigma) {
    int N = rows(y1);
    int N_pred = size(x_pred);
    vector[N_pred] f2;
    {
      matrix[N, N] K = add_diag(gp_exp_quad_cov(x, magnitude, length_scale),
                                sigma);
      matrix[N, N] L_K = cholesky_decompose(K);
      vector[N] L_K_div_y1 = mdivide_left_tri_low(L_K, y1);
      vector[N] K_div_y1 = mdivide_right_tri_low(L_K_div_y1', L_K)';
      matrix[N, N_pred] k_x_x_pred = gp_exp_quad_cov(x, x_pred, magnitude, length_scale);
      f2 = (k_x_x_pred' * K_div_y1);
    }
    return f2;
  } 
}
data {
  int<lower=1> N;
  int<lower=1> D;
  vector[D] x[N];
  vector[N] y;

  int<lower=1> N_pred;
  vector[D] x_pred[N_pred];
}
transformed data {
  vector[N] mu;
  mu = rep_vector(0, N);
}
parameters {
  real<lower=0> magnitude;
  real<lower=0> length_scale;
  real<lower=0> sig;
  
  real<lower=0> sigma;
}
transformed parameters {
  matrix[N, N] L_K;
  {
    matrix[N, N] K = gp_exp_quad_cov(x, magnitude, length_scale);
    K = add_diag(K, square(sigma));
    L_K = cholesky_decompose(K);
  }
}
model {
  magnitude ~ normal(0, 3);
  length_scale ~ inv_gamma(5, 5);
  sig ~ normal(0, 1);
  
  sigma ~ normal(0, 1);
  
  y ~ multi_normal_cholesky(mu, L_K);
}
generated quantities {
  vector[N_pred] f_pred = gp_pred_rng(x_pred, y, x, magnitude, length_scale, sigma);
  vector[N_pred] y_pred;
  
  for (n in 1:N_pred) y_pred[n] = normal_rng(f_pred[n], sigma);
}

Also, here’s a shameless plug for GPs, one of my many unfinished projects:

Reading this should help with what you’re actually modeling a bit.

Hope this helps!

EDIT: on another note, I was playing with GPs with different noises the other day and HMC/Stan had a lot of trouble with identifiability and some divergences. Terrible posterior geometry.

idk bro good luck

Topic		Replies	Views
Hierarchical GP Modeling	5	1964	October 21, 2017
Problem fitting hierarchical Gaussian process model Modeling gaussian-process	5	988	March 25, 2020
Multivariate Gaussian Process Regression Inference Modeling fitting-issues	8	1444	June 9, 2017
Gaussian Process out-of-sample predictive distribution Modeling gaussian-process	21	2153	December 6, 2024
Likely-silly question on Gaussian Processes: can/should they be scale()'d? Modeling specification	1	766	February 10, 2022

Gaussian process with hierarchical Gaussian process prior

Related topics