Scalable Gaussian process inference

Thank you for the feedback on the manuscript. We’ve uploaded a second version to the arXiv.

  • As suggested by @mike-lawrence (thank you!), we added a discussion on how much padding is required to overcome boundary conditions with simulations in the appendix. In short, two correlation lengths of padding are likely sufficient for squared exponential and Matérn 3/2 kernels. The specifics of course depend on the details of the model. We also included simulation-based calibration for the Fourier approach to provide a slightly different perspective on the padding required. If the forward model is a regular GP and we fit using Fourier methods without padding, posterior quantiles differ substantially from the uniform distribution we expect. With padding equal to one correlation length, it becomes difficult to distinguish the ranks from a uniform distribution.
  • We included a brief discussion of the non-uniform FFT mentioned by @pgree and the Hilbert-space approximation mentioned by @avehtari.

We felt a detailed comparison of the Hilbert-space approximations (HSA) and the Fourier methods (FM) was a bit much for the discussion, but here is some more information. In short, the methods are conceptually similar but differ in the details.

Both use the eigenfunctions of the Laplace operator to decompose the signal but employ different boundary conditions. HSA use Dirichlet boundary conditions (pinned to zero at the boundary; see eq. (5) of https://arxiv.org/pdf/2004.11408.pdf) whereas FM use periodic boundary conditions. HSA has “half” the eigenfunctions of FM due to the Dirichlet boundary conditions. To be more precise, decomposing in terms of HSA is the sine transform of the signal (there are no cosine terms because they don’t vanish at the boundary). FM of course gives us the full Fourier transform. This might suggest that FM can be more expressive given the same number of basis functions. For example, many sine basis functions would be required to capture a non-zero GP near the boundary and also model the rest of the signal. Both methods are misspecified in the sense that the boundary conditions are not consistent with a standard GP model and the domain must be extended for a good approximation (discussed in section 4 of HSA and the appendix in our updated draft).

A few advantages of each (provided I understood correctly).

  • HSA allows for non-uniform sampling locations by evaluating the basis functions explicitly.
  • Once we have accepted the misspecification due to boundary conditions, FM can evaluate the likelihood exactly in n\log(n) time exactly for any regularly spaced signal.
  • Centered parameterizations are a little easier with FM because the FFT is invertible. This is more difficult with HSA because we would have to evaluate n basis functions to reconstruct the signal exactly.

Luckily, we can get the best of both worlds: use FFTs if the signal is regularly spaced and use explicit basis functions (maybe using periodic boundary conditions rather than Dirichlet for more expressiveness) if the observations are irregularly spaced.

5 Likes

I disagree on this. If this were true, FM would not need padding. Both approaches need padding, and I expect the expressiveness to be similar given the same number of basis functions, but we can know for sure only after someone makes the experiments.

I don’t understand this point. HSA given the basis functions is just a linear model, and we can switch between centered and non-centered parameterization trivially. Unless you mean something else with “centered parameterizations” here.

If you someday have extra time, The Birthday example has equidistant time points (days), enough observations to be challenging for covariance matrix based approach, and non-trivial combination of many GPs. It would be great to that even faster.

2 Likes

I’d be interested to see that too!

I do suspect something more specific is meant here? Maybe posteriors with super informative likelihoods, which might make the posterior look weird for the HSA?

In any case, the centeredness of the HSA can be easily tuned automatically on a per basis function basis, while FM AFAIK only allows “monolithic” centering / non-centering?

Fair point. I’ll try to run some experiments, expand a bit on the centered/non-centered parameterization, and update the thread once I have the results. Thanks for the birthday pointer as well, I’ll try to add that to the examples.

3 Likes

Looking at the gptools repo and relevant examples, I don’t see anywhere that a generated quantities block is used.

@tillahoffmann is there an example you can share where we compute the GP on new data, given the fitted parameters for the covariance in the frequency domain? I am trying to grok how we might extend the gp_inv_rfft2 function to do this, but wanted to ask if you had already done this.

@mike-lawrence
How can this be achieved in generated quantities for say a time series? In particular, should I include both training and testing data in the full model, but mask the testing data?

Does this preclude using a standalone generated quantities for new testing data for these models?

Any advice would be much appreciated!

A current extension to the Poisson example might look like

// Gaussian process with log link for Poisson observations using the default centered
// parameterization.
functions {
    #include gptools/util.stan
    #include gptools/fft1.stan
}

data {
    // Information about observation locations and counts.
    int n;
    int n_train;
    array [n] int y;
    array[n_train] int n_train_indices;
}
transformed data {
    real nu = 3./2.;
}

parameters {
    // Kernel parameters.
    real<lower=0> sigma;
    real<lower=0> length_scale;
    real<lower=0> epsilon;
    vector[n] z;
}

transformed parameters {
    vector[n] eta;
    vector[n %/% 2 + 1] cov_rfft = gp_periodic_matern_cov_rfft(nu, n, sigma, length_scale, n) + epsilon;
    eta = gp_inv_rfft(z, zeros_vector(n), cov_rfft);
}

model {
    // White noise prior (implies eta ~ gp_rfft(...) and observation model.
    z ~ std_normal();
    sigma ~ std_normal();
    length_scale ~ inv_gamma(5,5);
    epsilon ~ std_normal();

    y[n_train_indices] ~ poisson_log(eta[n_train_indices]);
}

generated quantities {
}

but ideally, the API would have a populated generated quantities block that would compute a “test” eta for use on test locations…

Good question! I’ve always expanded the domain to make predictions as you suggested with masking. Say we have n_obs observations and want to make n_pred predictions, then we can sample vector [n_obs + n_pred] z random variables in the Fourier domain and only observe the n_obs values in the model definition. That’s what we did here for prediction using gp_inv_rfft2.

I haven’t thought about it very carefully, but I imagine it is difficult to sample in the Fourier domain in the generated quantities block. In particular, if we have samples y_obs of the Gaussian process at locations where the GP was observed, we need to sample all the n_pred + n_obs Fourier coefficients to make predictions because each basis function affects y on the entire domain. But conditioning on y_obs imposes a hard constraint on the Fourier coefficients, and we end up with a degenerate multivariate normal distribution of size matrix [n_obs + n_pred, n_obs + n_pred] which is worse than using the standard conditioning approach.

Maybe @avehtari has some ideas based on the Hilbert space GPs.

Okay, this is also what I was thinking. Thanks for responding!