How to make GP with non-Gaussian likelihood more efficient?

Hello all,

I’m using rstan to make prediction of count data while tuning hyper parameters.

Data can be downloaded here
gp.pois.data.2d.R (14.5 KB)

Stan file can be run by

data <- read_rdump(“gp.pois.data.2d.R”)
pred_data <- list( N_observed=data$N, D=2, y_observed=data$y,
N_predict=data$N_predict, x_predict=data$x_predict,
observed_idx=data$sample_idx)

fit <- stan(file=‘poisson.stan’,
data=pred_data, seed=5838298,chain=1)

There are 26 2-dimensional observed inputs x and outputs y, and I want to estimate the posterior predictive distribution given 1296 2-dimensional inputs x. However, running Stan code overnight will stuck at 0% of iterations. Does anyone know how to make the estimation more quickly? Are there any techniques that I could possibly use? Many thanks in advance!

data {
  int<lower=1> N_predict;
  int<lower=1> D;
  vector[D] x_predict[N_predict];

  int<lower=1> N_observed;
  int<lower=1, upper=N_predict> observed_idx[N_observed];
  int y_observed[N_observed];
}

parameters {
  real<lower=0> rho;
  real<lower=0> alpha;
  vector[N_predict] f_tilde;
}

transformed parameters {
  vector[N_predict] log_f_predict;
{
  matrix[N_predict, N_predict] cov;
  matrix[N_predict, N_predict] L_cov;
  cov =  cov_exp_quad(x_predict, alpha, rho)
                     + diag_matrix(rep_vector(1e-10, N_predict));
  L_cov = cholesky_decompose(cov);
  log_f_predict = L_cov * f_tilde;
}
}

model {
  f_tilde ~ normal(0, 1);
  rho ~ inv_gamma(6.8589, 103.582);
  alpha ~ normal(0, 10);
  y_observed ~ poisson_log(log_f_predict[observed_idx]);
}

generated quantities {
  vector[N_predict] f_predict = exp(log_f_predict);
  vector[N_predict] y_predict;
  for (n in 1:N_predict)
    y_predict[n] = poisson_log_rng(log_f_predict[n]);
}
1 Like

It’s been a while since I last looked at how to do interpolation with GPs in Stan, but as I recall you need to model based on the dimensionality of the observed data (so, cov should be a 23x23 matrix), then in the generated quantities section you construct a larger predict_cov using the desired interpolation dimensionality.

Thank you for your reply! I followed “Predictive Inference in non-Gaussian GPs” section in https://mc-stan.org/docs/2_19/stan-users-guide/fit-gp-section.html. I think in predictive inference, cov matrix in transformed parameters has been 1296*1296 dimensions?

Ah, but in the following “Analytical Form of Joint Predictive Inference” section, it shows a faster variant where you do the smaller observed covariance in the model and construct a larger prediction covariance in the generated quantities section.

Yes, I’m aware of that. But I thought it only applies for gaussian likelihood but does not apply for poisson likelihood, as the predictive posterior distribution does not have the closed form if the likelihood is poisson?

1 Like

Hm, you may be right that the analytic interpolation trick is only possible with Gaussian outcomes. This is now beyond my expertise; @avehtari, have any thoughts?

1 Like

I would use brms’s approximate gp (gpa) approach. It’s detailed here, and can be used for multidimensional GPs as well. It’s already in brms, so you can just do brms(y ~ gp(x1,x2,K=50), family = …). It is magical. It turns a 2 hour model into a 20 second model; an intractably large GP into something that takes 2-3 hours. The gist of it, is that it decomposes the RBF kernel function into eigenfunctions and eigenvalues. The larger the number of eigenfunctions used (K), the more accurate it is. I’ve found that even with 50, it tends to do very well. It’s far faster, nearly as accurate, converges better, and prediction is simpler. Brms, again, will handle the prediction side for you.

4 Likes

Many thanks for your suggestions! I’ll try this tutorial then.

Correct.

Stan will soon have possibility to use Laplace method to integrate over the latent values, which is usually very accurate with Poisson likelihood and faster than sampling the latent values.

brms has two different GP approximations. s(x1, bs="gp") and gp(x1, k=20). The first one is based on spline basis function presentation of Matern covariance function GP from mgcv package, and the latter is based on the basis function approximation via Laplace eigenfunctions (based on Solin & Särkkä, Hilbert space methods for reduced-rank Gaussian process regression and further implementation details will be described in a paper under preparation).

s(x1, bs="gp") works for 1D and 2D

gp(x1, k=20) works also for 3D and sometimes up to 4D given the approximated function is quite smooth. The number of basis functions is k^D.

The current exact GP in brms gp(x1) is not using the more efficient builtin covariance functions, but for small data can still be useful for more than 4 dimensions.

Given “given 1296 2-dimensional inputs x” I would use one of the basis function approaches.

@Stephen_Martin can you describe a bit more what kind of data and models you have had when using gp(x1, x2, k=20) in brms? We are interested learning success and failure stories before finishing the paper.

2 Likes

@avehtari I currently am using additive univariate kernels, currently (so, in brms terms: gp(x1) + gp(x2)) [there’s a reason for this]. I’ve been using generated data, and tested on a couple others. Typically, it’s std.normal predictors between N=300 and N=1000. I am currently applying it to latent variable models (e.g., in CFA/SEM). Specifically, I have a latent location variable for J indicators, and a corresponding latent scale variable for the J indicator’s residual variances. I have the latent location variable predicting the latent scale variable via an approximate gp.
This lets us have non-linear functions of reliability over latent space, ala IRT test information.

@Stephen_Martin have you noticed any strange or unexpected behavior?

@avehtari Nope. It may help that my input space is always similar, because inputs are always N(0,1), so my limits are just between -3*(5/2) and 3*(5/2).
Occasionally there are convergence problems (in terms of Rhats), but certainly less often than when using an exact GP.
What sort of behavior should I be on the lookout for?

1 Like

I’m looking for any behaviour which would bother you, but I’m happy if you haven’t seen any problems.