Gaussian process regression

linas · April 12, 2021, 4:35pm

Hello,

I have a GPR with squared exponential kernel. The design matrix is 5000x14 (5000 observations with 14 continuous variables). Is it possible to solve such problems in Stan? Or is it better to use GPstuff?

Maybe @avehtari could help?

Thank you

avehtari · April 13, 2021, 8:13am

Not enough information for a complete answer. In special cases (e.g. 1) normal model for the target, single length scale (might even work with several length scales if carefully done), or 2) additive model with basis functions) yes and faster with optimization. In general case, the autodiff may require too much memory or dynamic HMC to integrate over the latent values can be very slow.

mike-lawrence · April 13, 2021, 3:59pm

Are the continuous predictors on a grid at all? If so see the Kronecker product trick from @avehtari and colleagues here.

linas · April 14, 2021, 9:23pm

Thank you. I would like implementing 2) is there an example I can look at? Do I need GPU?

avehtari · April 15, 2021, 7:32am

Additive GP with basis functions case study Birthdays (there’s also link to the paper describing the approach in detail)

No

linas · April 15, 2021, 4:02pm

Thank you. Is there a way to do something similar to ARD in this case? The problem I have is really explainable AI (XAI). One approach is to use SHAP values to establish importance of features. I would like to design something Bayesian by using Gaussian process.

avehtari · April 15, 2021, 4:11pm

We’ve done feature importance estimation using Gaussian processes, see Uncertainty-aware Sensitivity Analysis Using Rényi Divergences (and an older version Gaussian processes via sensitivity analysis of the posterior predictive distribution). Were you thinking something like that or something else?

linas · April 16, 2021, 5:31pm

I guess. I was thinking about having Gaussian process model for my system and using ARD to determine important variables. But it seems my problem is too large for latent Gaussian process. Thus the second paper might be a good solution. Do you have Stan or R code to share?

linas · April 22, 2021, 9:25pm

Question. Do you scale the data? Let’s say in your design matrix the first column ranges from 0 to 1 while the second column ranges from 0 to 100. Do you want to scale all columns to the common scale -1 to 1 when calculating the distance for the radial basis function kernel?

avehtari · April 23, 2021, 7:34am

Depends on the data, but usually I would normalize continuous covariates to have mean=0 and sd=1, so that it’s easier to define the priors for lengthscales. Depending on the additional domain specific information other scalings can be better to make the covariates to be easier to compare.

linas · April 24, 2021, 8:34pm

Thank you. The original problem was to find important variables. My intention was to use ARD for this. However for 5000 observations and 14 variables with a binomial likelihood ARD was too slow.
My understanding is that the algorithms in both articles require to estimate PPD. In case I have a squared exponential kernel with different length scales I gain nothing with respect to computational time. Perhaps if I use radial basis function kernel instead to estimate PPD for training points with Stan I could reduce computational burden. Then I could apply ideas from the older paper. Right? Or do you have any other suggestions?

avehtari · April 26, 2021, 12:25pm

ARD is also a bad measure for relevance as it measures mostly non-linearity [1510.04813] Projection predictive model selection for Gaussian processes (published in Projection predictive model selection for Gaussian processes | IEEE Conference Publication | IEEE Xplore)

With that ratio, no need for MCMC. Use something like GPy, GPflow or in R CRAN - Package gplite

Sorry, I don’t understand what you are saying.

If you want to model all interactions, have binomial observation model, 5000 observations and 14 variables, use something else than Stan.

If you are happy with an additive model, have binomial observation model, 5000 observations and 14 variables, you can try the basis function approximated GP also in Stan (probably good to use glm compound functions for speedup).

linas · April 26, 2021, 4:27pm

Can you please explain why?

linas · April 26, 2021, 4:32pm

Can you please provide some link to basis function approximated GP?

avehtari · April 26, 2021, 8:14pm

Based on the information you gave about the the model, the conditional posterior of the latent values is well approximated with normal distribution and the latent variables can be efficiently integrated out with Laplace, EP or VI and the remaining marginal posterior of the covariance function parameters is low dimensional and close to normal and for variable relevance purposes finding the maximum of the marginal posterior with optimization would be sufficient. The packages I mentioned do this kind of inference efficiently. If you would a more complicated model or would have a much smaller data then MCMC could produce more accurate inference.

Yes. See the earlier message in this thread Gaussian process regression - #5 by avehtari

linas · April 27, 2021, 9:29pm

The original problem can be posed as latent variable GP:
y_i\sim Binomial(5,p_i)
p\sim MultivariateNormal(0,K(x∣\theta))
I would like to estimate posterior of p as well as to determine important variables. There are 5000 observations and vector x has 14 components.
I’m still struggling how to approximate binomial outcome with normal outcome.

linas · April 30, 2021, 9:24pm

The problem I have can be posed as latent variable GP:
y_i\sim Binomial(5,p_i), i=1,...,5000
p\sim MultivariateNormal(0,K(x∣\theta))
I would like to estimate the posterior of p as well as to determine important variables. Vector x has 14 components.
Can you please explain how to integrate out latent variables with variational inference to get marginal posterior of the covariance function parameters? Can I use Stan to perform integration?

avehtari · May 2, 2021, 7:00pm

This is what I expected, and this is the very basic form of Gaussian process and there are many software packages that can handle this much faster than Stan.

See Chapter 3 in Gaussian processes for machine learning and e.g. Scalable Gaussian process inference using variational methods

There are many GP specific software packages implementing these.

Currently, you can use Stan MCMC to perform integration, but it is currently much slower than GP specific software (in the future it will be less slow but still slower than GP specific software).

There is also a branch for Stan that implements embedded Laplace approximation to integrate over the latent values, see Hamiltonian Monte Carlo using an adjoint-differentiated Laplace approximation: Bayesian inference for latent Gaussian models and beyond, but that may require some work to get running and as your model is that simple, Stan doesn’t provide any additional benefit over faster GP specific software.

fabio · May 3, 2021, 8:54am

when you refer to “many software packages” doing faster GP inference are you talking -for example- about INLA or you have other software in mind?

avehtari · May 3, 2021, 9:25am

INLA is good for Gaussian Markov Random Field priors, but not for Gaussian process priors beyond 2 dimensions. I did list above these packages

but there are several others, too.

Topic		Replies	Views
Approximate GPs with Spectral Stuff Modeling gaussian-process	57	9099	August 5, 2020
Using 2D Gaussian process predictions within model Modeling gaussian-process	11	2447	February 18, 2022
Gaussian process roadmap Developers gaussian-process	48	4460	April 12, 2019
Bayesian kernel machine regression in Stan Modeling rstan , techniques , gaussian-process , brms	22	2178	July 19, 2023
How to make GP with non-Gaussian likelihood more efficient? Modeling gaussian-process	21	2099	February 25, 2020

Gaussian process regression

Related topics