Gaussian process regression

fabio · May 3, 2021, 9:30am

oh! sorry for flooding @avehtari ! I did miss your list of software! My bad…
thanks!

linas · May 5, 2021, 9:37pm

Thank you. I have tried gplite but it is almost impossible to guess initial parameters for optimization for my problem with 14 variables. Do you have any suggestions of more robust packages to use?

avehtari · May 7, 2021, 5:55pm

Interesting. In my experience initialization problems are really rare (my experience mostly with gpstuff and gplite has quite similar implementation). Did you scale your data?

linas · May 7, 2021, 8:01pm

This code leads to infeasible starting point. Any suggestions how to get feasible starting point?

cf <- cf_sexp(normalize = T)
lik <- lik_bernoulli()
gp <- gp_init(cf, lik)
gp <- gp_optim(gp, X, y, trials = trials)

linas · May 31, 2021, 6:00pm

gplite offers a way of finding the posterior mode via gp_optim, but does not currently provide means for sampling or otherwise approximating the full posterior. How would you suggest to determine the importance of features? The procedure in the article Gaussian processes via sensitivity analysis of the posterior predictive distribution is not directly applicable. Besides gplite allows only one length scale.

avehtari · May 31, 2021, 6:33pm

You are correct. As I said before, based on the information you provided, no sampling is needed and sampling would be slow. Optimization should be fine in your case.

Why not? It’s using optimization (not sampling).

This doesn’t stop using the approach presented in the paper.

There’s also an improved method described in a paper I mentioned in this thread before Uncertainty-aware Sensitivity Analysis Using Rényi Divergences . That method also works with optimization, with one length-scale, and works better than ARD and SHAP, which are all reasons why I did recommend it for you in April.

linas · June 1, 2021, 10:08pm

Thanks a lot for the explanation. Suppose I would like to use KL measure of relevance. Would you please guide me how to calculate equation (2) for the binomial likelihood? In the paper there are only examples of univariate normal and Bernoulli likelihood. Sorry, my statistical background is bit shaky. The formula for the binomial case is in probability - Kullback-Leibler divergence for binomial distributions P and Q - Mathematics Stack Exchange but I still don’t understand how to calculate p and q. My guess is that p is determined by optimization. Right? But then how to calculate probability at perturbed point?

avehtari · June 2, 2021, 12:27pm

I realized that code for the paper wasn’t yet in the git repo mentioned in the paper. @topipa will add the code soon, and we’ll mention here when the code is available

topipa · June 2, 2021, 4:31pm

First bits of code are now in the repo: github.com/topipa/rsens-paper

I will be adding more functionality in the near future. Feel free to raise issues for any bugs or missing features!

linas · June 2, 2021, 4:54pm

I was thinking about http://proceedings.mlr.press/v89/paananen19a/paananen19a-supp.pdf
Would you please explain how to calculate \pi_* and \pi_{*, \Delta_j} ?

linas · June 2, 2021, 9:54pm

Thanks a lot for the code. Is it possible to add binomial likelihood? Alternatively, can you advise how to modify rsens code in gpytorch folder to implement binomial likelihood?

topipa · June 3, 2021, 10:55am

I have not yet implemented the analytical expression for binomial likelihood. It is something I will consider in the future. In the mean time, you can use the finite difference alternative. You just need a way to compute the success probability after optimising the GP, and then you can compute the KL divergence between unperturbed and perturbed points.

I’m not very familiar with gplite, but as far as I understand it uses quadrature integration to compute the success probability. Here’s an example with binomial likelihood (adapted from gplite tutorial):

library(gplite)
sigmoid <- function(x) {
  1/(1 + exp(-x))
}

# Generate some toy data
set.seed(32004)
trials <- 10
d <- 3
n <- 250
sigma <- 0.1
x <- matrix(rnorm(n * d),n,d)
ycont <- 0.5*x[,1] + x[,2] + rnorm(n) * sigma
p <- sigmoid(ycont)
y <- rbinom(n,rep(trials,n),p)

# Fit the model using Laplace approximation
lik <- lik_binomial()
cf <- cf_sexp()
gp <- gp_init(cf, lik)
gp <- gp_optim(gp, x, y, trials = trials)

# compute KL sensitivity values
pert <- 0.001
pertm <- matrix(0,n,d)
kl <- matrix(0,n,d)
for (dd in seq(d)) {
  pertm[,dd] <- pert
  ps <- gp_pred(gp,x, transform = TRUE)
  ps_pert <- gp_pred(gp,x + pertm, transform = TRUE)
  kld <- log(ps$mean/ps_pert$mean)*trials*ps$mean + log((1 - ps$mean)/(1-ps_pert$mean)) * trials * (1-ps$mean)
  kl[,dd] <- sqrt(2*kld)/pert
  pertm[,dd] <- 0
}

# importance of the variables
colMeans(kl)

linas · June 3, 2021, 8:53pm

Thank you very much.

linas · July 5, 2021, 10:28pm

Can you please share the code in Python to calculate importance or features using KL or Renyl divergences?

topipa · July 15, 2021, 4:24am

Here is our repository for Python code (using GPy) for the finite difference method. The repository for the Rényi divergence method (using gpytorch) is found here. Are either of these useful for your use case?

linas · July 21, 2021, 5:55pm

Is it possible to go one step further and determine uncertainty on feature importance?

linas · September 6, 2021, 8:58pm

Question for @avehtari. What about a more complicated model:
CL\sim MultivariateNormal(0,K_{CL} (x∣θ_{CL} ))
Vd\sim MultivariateNormal(0,K_{Vd} (x∣θ_{Vd}))
y\sim normal (f (CL,Vd), \sigma^2)
Is there a way to determine important features? Can I use the KL divergence approach? Here K_{CL} and K_{Vd} are Gaussian process kernels and f is a function of CL and Vd.

avehtari · September 9, 2021, 4:58pm

How many features?

linas · September 9, 2021, 9:29pm

In this case we have around 10 features. In addition, based on physics we know that CL is not being influenced by some features while Vd is not being influenced by some other features while some features such as weight influence both CL and Vd.

avehtari · September 19, 2021, 6:50pm

Yes. With 10 features, it should be feasible.

Topic		Replies	Views
Approximate GPs with Spectral Stuff Modeling gaussian-process	57	9491	August 5, 2020
Using 2D Gaussian process predictions within model Modeling gaussian-process	11	2559	February 18, 2022
Gaussian process roadmap Developers gaussian-process	48	4763	April 12, 2019
Bayesian kernel machine regression in Stan Modeling rstan , techniques , gaussian-process , brms	22	2355	July 19, 2023
How to make GP with non-Gaussian likelihood more efficient? Modeling gaussian-process	21	2260	February 25, 2020

Gaussian process regression

Related topics